There's a paradox in our ability to pay attention. When we are hyper-focused on our surroundings, our senses become more acutely aware of the signals they pick up. But sometimes when we are paying attention, we miss things in our sensory field that are so glaringly obvious, on a second look we can’t help but question the legitimacy of our perception.
Back in 1999, the psychologist Daniel Simons created a clever scenario that poignantly demonstrates this phenomenon. (Test it yourself in less than two minutes by watching Simons’ video here, which we recommend before the spoiler below.)
In the scenario, there are two teams, each consisting of three players, with one team dressed in black and the other in white. The viewer is asked to count how many passes the team in white makes throughout the course of the video. Sure enough, as the video ends, most people are able to accurately guess the number of passes. Then the narrator asks: But did you see the gorilla?
As it turns out, someone in a gorilla suit slowly walks into the scene, in plain sight. Most people who watch the video for the first time and focus on counting passes completely overlook the out-of-place primate. It seems strange, given the viewer’s intent observation of the small field of view where the scene unfolds.
Predictive Processing
Neuroscientist Anil Seth offers an interesting explanation of this phenomenon in his book Being You: A New Science of Consciousness. Seth’s description draws from one of neuroscience’s leading theories of cognition and perception.
Predictive processing, also known as predictive coding, suggests that the content of our experiences and perceptions of the world are primarily based on predictive models our brains have constructed through our previous experiences. Our brains, locked inside the confines of a skull, have the unenviable task of trying to determine the causes of our sensory signals. By using predictive models to determine our perception, our brains are able to go beyond the data of our senses to form, what feel like, concrete experiences of phenomena in the world.
In a sense, our brains are constantly trying to solve what philosophers call an inverse inference problem, where we don’t have direct access to the causes of our sensory signals. Our sensory signals are the effects of phenomena out there in the world that do not necessarily reflect the nature of the causes that produced them. And with this limited data, our brains fill in the missing gaps by producing models that predict their causes.
In this predictive processing framework, our perceptions are top-down phenomena, and are the brain's ‘best guess’ of what is happening outside us and within us. This is in contrast to a bottom-up model of perception, where our senses would primarily inform us of what we perceive, with our perceptions being an unfiltered readout of that data (what we see, hear, smell etc).
But in predictive processing, our senses still play an important role in our overall perception, as our predictions, so-called “priors,” and generative models of the world are constantly cross referenced with what our senses are telling us. This cross referencing inevitably leads to prediction errors, as our models don’t always neatly match up with what our senses tell us. These errors then play a vital role in helping the brain update it’s predictions, giving it more data to choose from for the next scenario in which it finds itself.
In Being You, Seth describes how generative models are the brain’s bank of perceivable content. For a person to be able to perceive something like a team of people passing a ball, that person will need a generative model which incorporates the sensory signals we would expect to encounter if we ran into a team of people passing a ball; swift movements, bodies swishing around and perhaps some exercise-related odors.
Our generative models allow our brains to make informed guesses of what's out there in the world, and our incoming sensory signals are compared against these predictions in real time to form prediction errors, which then update our generative models in a continual effort to minimize prediction error.
Perceptual Hierarchy
Perceptual hierarchies are another component in these unfolding processes. Our predictions of the world occur at varying degrees of scale that can involve fully fledged objects and entities like cats and cars, but we also predict the characteristics that make up these entities, like fur and wheels.
A high-level prediction like seeing a team of people passing a ball cascades down to lower level predictions like the type of clothing they are wearing, the kind of movements they are making, and the varying sounds that accompany them. These flow down to even lower level predictions about the shape of the ball, light bouncing off the floor, and the movement of these bodies in space.
While our brains lack access to the direct causes of our sensory signals, they also don't know how reliable those sensory signals are. And so a key aspect in understanding why we often miss things when we are paying attention is called precision weighting. This refers to the degree to which our sensory signals affect our perception.
If someone swivels their head around and catches a glance of a team passing a ball, then those visual sensory signals will have low reliability and won’t influence our perception as much as if we paused and stared at the team. Simply glancing at something will have the effect of down-weighting the estimated precision that those sensory signals have, and will therefore have less influence on our perceptual best guess.
Up-weighting is when our sensory signals have been deemed to be particularly reliable and will have a stronger influence on our perception. While this might be tricky to wrap your head around, increasing the estimated precision of your sensory signals is simply ‘paying attention.’
Viewing paying attention in this way then makes sense of why we sometimes miss things in our sensory field. If we are increasing the influence that some specific sensory data will have on our perceptual best guess, then data that is not the focus of our attention will have little to no effect on our perceptual best guesses. So while paying attention is useful for honing in on specific sensory signals, it also can inhibit us from getting a more complete perceptual picture of what is unfolding around us.