True Vision: How We See

Seeing is a joint enterprise: The eye records, but the brain must interpret.

By V. S. Ramachandran|Tuesday, August 19, 2014

Our perception of the world ordinarily seems so effortless that we tend to take it for granted. We look, we see, we understand—it seems as natural and inevitable as water flowing downhill.

In order to understand perception, we need to first get rid of the notion that the image at the back of the eye simply gets “relayed” back to the brain to be displayed on a screen. Instead, we must understand that as soon as rays of light are converted into neural impulses at the back of the eye, it no longer makes any sense to think of the visual information as being an image. We must think, instead, of symbolic descriptions that represent the scenes and objects that had been in the image. Say I want someone to know what the chair across the room from me looks like. I could take him there and point it out to him so he could see it for himself, but that isn’t a symbolic description. I could show him a photograph or a drawing of the chair, but that is still not symbolic because it bears a physical resemblance. But if I hand the person a written note describing the chair, we have crossed over into the realm of symbolic description: The squiggles of ink on the paper bear no physical resemblance to the chair; they merely symbolize it.

Analogously, the brain creates symbolic descriptions. It does not re-create the original image, but represents the various features and aspects of the image in totally new terms—not with squiggles of ink, of course, but in its own alphabet of nerve impulses. These symbolic encodings are created partly in your retina itself but mostly in your brain. Once there, they are parceled and transformed and combined in the extensive network of visual brain areas that eventually let you recognize objects. Of course, the vast majority of this processing goes on behind the scenes without entering your conscious awareness, which is why it feels effortless and obvious.

In primates, including humans, a large chunk of the brain—comprising the occipital lobes and parts of the temporal and parietal lobes—is devoted to vision. Each of the 30 or so visual areas within this chunk contains either a complete or partial map of the visual world. We don’t really know why we higher primates have such a large number of distinct areas, but it seems that they are all specialized for different aspects of vision, such  as color vision, seeing movement, seeing shapes, recognizing faces, and so on. The computational strategies for each of these might be sufficiently different that evolution developed the neural hardware separately.

The Woman Who Couldn't See Motion

A good example of this is the middle temporal (MT) area, a small patch of cortical tissue found in each hemisphere that appears to be mainly concerned with seeing movement. In the late 1970s a woman in Zurich whom I’ll call Ingrid suffered a stroke that damaged the MT areas on both sides of her brain but left the rest of her brain intact.

Ingrid’s vision was normal in most respects: She could read newspapers and recognize objects and people. But she had great difficulty seeing movement. When she looked at a moving car, it appeared like a long succession of static snapshots, as if seen under a strobe. She was terrified of crossing the street because she didn’t know how fast the cars were approaching. When she poured water into a glass, the stream of water looked like a static icicle. She didn’t know when to stop pouring because she couldn’t see the rate at which the water level was rising, so it always overflowed. Even talking to people was like “talking on a phone,” she said, because she couldn’t see the lips moving. Life became a strange ordeal for her.

So it would seem that the MT areas are concerned mainly with seeing motion but not with other aspects of vision. Other bits of evidence support this view.

Unfortunately, most of the rest of the 30 or so visual areas of the primate brain do not reveal their functions so cleanly when they are lesioned, imaged, or zapped. This may be because they are not as narrowly specialized, or their functions are more easily compensated for by other regions (like water flowing around an obstacle), or perhaps our definition of what constitutes a single function is murky (“ill posed,” as computer scientists say). But in any case, beneath all the bewildering anatomical complexity there is a simple organizational pattern that is very helpful in the study of vision. This pattern is a division of the flow of visual information along (semi-) separate, parallel pathways.

Let’s first consider the two pathways by which visual information enters the cortex. The so-called old pathway starts in the retinas, relays through an ancient midbrain structure called the superior colliculus, and then projects, via the pulvinar, to the parietal lobes. This pathway is concerned with spatial aspects of vision: where, but not what, an object is. The old pathway enables us to orient toward objects and track them with our eyes and heads. If you damage this pathway in a hamster, the animal develops a curious tunnel vision, seeing and recognizing only what is directly in front of its nose.

The new pathway, which is highly developed in humans and in primates generally, allows sophisticated analysis and recognition of complex visual scenes and objects. This pathway projects from the retina to area V1, and from there splits into two subpathways, or streams: pathway one, or what is often called the “how” stream, and pathway two, the “what” stream. You can think of the “how” stream as being concerned with the relationships among visual objects in space, while the “what” stream is concerned with the relationships of features within visual objects themselves. The “how” stream projects to the parietal lobe and has strong links to the motor system. When you dodge an object hurled at you, when you navigate around a room avoiding bumping into things, when you step gingerly over a tree branch or a pit, you are relying on the “how” stream. Most of these computations are unconscious and highly automated, like a robot or a zombie copilot that follows your instructions without need of much guidance or monitoring.

Before we consider the “what” stream, let me first mention the fascinating visual phenomenon of blindsight. It was discovered in Oxford in the late 1970s by Larry Weizkrantz. A patient named Gy had suffered substantial damage to his left visual cortex—the origin point for both the “how” and the “what” streams. As a result he became completely blind in his right visual field—or so it seemed at first. In the course of testing Gy’s intact vision, Weizkrantz told him to reach out and try to touch a tiny spot of light that he told Gy was to his right. Gy protested that he couldn’t see it, but Weizkrantz asked him to try anyway. To his amazement, Gy correctly touched the spot. Gy insisted that he had been guessing and was surprised when he was told that he had pointed correctly. But repeated trials proved that it had not been a lucky stab in the dark; Gy’s finger homed in on target after target, even though he had no conscious visual experience of where they were or what they looked like. Weizkrantz dubbed the syndrome blindsight to emphasize its paradoxical nature.

How can a person locate something he cannot see? The answer lies in the anatomical division between the old and new pathways in the brain. Gy’s new pathway, running through V1, was damaged, but his old pathway was perfectly intact. Information about the spot’s location traveled up smoothly to his parietal lobes, which in turn directed his hand to move to the correct location.

From Perception to Action

Now let's have a look at pathway two, the “what” stream. This stream is concerned mainly with recognizing what an object is and what it means to you. This pathway projects from V1 to the fusiform gyrus and from there to other parts of the temporal lobes. The fusiform area itself mainly performs a dry classification of objects: It discriminates Ps from Qs, hawks from handsaws, and Joe from Jane, but it does not assign significance to any of them.

But as pathway two proceeds past the fusiform to other parts of the temporal lobes, it evokes not only the name of a thing but a penumbra of associated memories and facts about it—broadly speaking, the semantics, or meaning, of an object. You not only recognize Joe’s face as being “Joe” but remember all sorts of things about him: He is married to Jane, has a warped sense of humor, is allergic to cats, and is on your bowling team. This semantic retrieval process involves widespread activation of the temporal lobes, but it seems to center on a handful of “bottlenecks” that include Wernicke’s language area and the inferior parietal lobule, which is involved in quintessentially human abilities as such as naming, reading, writing, and arithmetic. Once meaning is extracted in these bottleneck regions, the messages are relayed to the amygdala, which lies embedded in the front tip of the temporal lobes, to evoke feelings about what (or whom) you are seeing.

In addition to pathways one and two, there seems to be an alternate, somewhat more reflexive pathway for emotional response to objects that I call pathway three. If the first two were the “how” and “what” streams, this one could be thought of as the “so what” stream. In this pathway, biologically salient stimuli such as eyes, food, facial expressions, and animate motion (such as someone’s gait and gesturing) pass from the fusiform gyrus through an area in the temporal lobe called the superior temporal sulcus and then straight to the amygdala. In other words, pathway three bypasses high-level object perception—and the whole rich penumbra of associations evoked through pathway two—and shunts quickly to the amygdala, the gateway to the emotional core of the brain, the limbic system. This shortcut probably evolved to promote fast reaction to high-value situations, whether innate or learned.

The amygdala works in conjunction with past stored memories and other structures in the limbic system to gauge the emotional significance of whatever you are looking at: Is it friend, foe, mate? Or is it just something mundane? If it’s important, you instantly feel something. If it is an intense feeling, the signals from the amygdala also cascade into your hypothalamus, which not only orchestrates the release of hormones but also activates the autonomic nervous system to prepare you to take appropriate action, whether it’s feeding, fighting, fleeing, or wooing. (Medical students use the mnemonic of the “four Fs” to remember these.)

Exactly how many of our visual areas are unique to humans isn’t clear. But a great deal more is known about them than about other higher-brain regions such as the frontal lobes, which are involved in such things as morality, compassion, and ambition. A thorough understanding of how the visual system really works may therefore provide insights into the more general strategies the brain uses to handle information, including the ones that are unique to us.


Excerpted from The Tell-Tale Brain: A Neuroscientist’s Quest for What Makes Us Human by V. S. Ramachandran. Copyright 2011 by V. S. Ramachandran. With permission of the publisher, W. W. Norton & Co. 

Comment on this article