An artist's conception of a differentiable neural computer. The neural network at the center does the data parsing, while reading writing and rewriting its memories. (Credit: DeepMind) Clive Wearing is a noted British musician, but he’s perhaps best known as the man with a 30-second memory. In the 1980s, Wearing contracted a strain of herpes virus that attacked his brain and destroyed his ability to form new memories. He might forget what he’s eating before food reaches his mouth. He struggles to frame experiences of the present with conceptions of time and place. Life for him is often akin to waking up from a coma — every 20 seconds. In a certain sense, artificial neural networks are Clive; they operate without working memory, erasing everything they learned when assigned to a new task. This limits the complexity of operations they can accomplish, because in the real world, countless variables are in constant flux. Now, the team from Google DeepMind has built a hybrid computing system, what they’re calling a “differentiable neural computer” (DNC), which pairs a neural network with an external memory system. The hybrid system learned how to form memories and use them to answer questions about maps of the London Underground transit system and family trees. “Like a conventional computer, it can use its memory to represent and manipulate complex data structures but, like a neural network, it can learn to do so from data,” the authors wrote in their paper, which was published Wednesday in the journal Nature.
Neural Networks Enhanced
Neural networks don’t execute functions with sets of preprogrammed commands; they create their own rules of operation through pattern recognition. Researchers feed an artificial neural network a training set of solved solutions to a specific task and all the data passes through hierarchical layers of interconnected nodes, or neurons. As more training data is fed through the layers, a simple computation that occurs at each node is automatically adjusted until the output matches the training set solutions. It’s sort of like tuning a guitar through trial and error. In this way, neural nets can parse data in images to recognize faces in photos or translate languages from text all on their own, based on patterns we would never recognize. But this skill can only go so far, and if you want that neural net to perform a new task, it needs to reset and consume another training set to tune itself. With memory, a neural network can keep its knowledge on file and use what it learned for another task. “Neural networks excel at pattern recognition and quick, reactive decision-making, but we are only just beginning to build neural networks that can think slowly – that is, deliberate or reason using knowledge,” DeepMind researchers wrote in a blog post Wednesday. DeepMind researchers couldn’t be reached Wednesday, because the team was “heads down preparing for launch,” according to an email from a DeepMind spokesperson.
Getting from Point A to B
Researchers fed the DNC maps of the London Underground system, and the neural net found patterns between station locations and the routes connecting them. Then, it saved these basic parameters in its memory — it offloaded its foundational “knowledge” into memory matrices. It built a simple, symbolic representation of the Underground in its memory. And again, it did this all without programmed commands. An unaided neural network had trouble charting a course from station to station, and only arrived at the correct location 37 percent of the time after 2 million training examples. But a neural network enhanced with memory reached the correct destination, and found the optimized route, 98.8 percent of the time after only 1 million training examples, researchers say.
A map of the London Underground. (Credit: Shutterstock) It could do similar work with a family tree. Researchers trained the neural net with information about parent, child and sibling relationships. It then stored these basic parameters in its memory, which allowed it to answer far more nuanced questions like ““Who is Freya’s maternal great uncle?” by drawing upon its memory when needed. Algorithms crafted by AI researchers were already solving these same rational, symbolic reasoning problems back in the 1970s. And other deep learning methods are far better than a DNC at logical data mining tasks. Again, the big difference is the DNC taught itself how to parse the data and how to use its memory, but it’s practical uses will be limited for now. “Other machine learning techniques already exist that are much better suited to tasks like this,” says Pedro Domingos, a professor of computer science at the University of Washington and author of The Master Algorithm. He wasn’t involved with the study. “Symbolic learning algorithms already exist, and perform much better than what (DeepMind is) doing.”
Flesh and Blood Analogues
It’s worth emphasizing here that neural networks are simply crunching numbers, so anthropomorphizing what they do only breeds misconceptions about the field in general. What we might consider "knowledge" is incredibly fluid, and disputed. Still, DeepMind researchers drew human-computer parallels in describing their work. “There are interesting parallels between the memory mechanisms of a DNC and the functional capabilities of the mammalian hippocampus,” researchers wrote.
Without prior programming, the DNC compiles information into a set of remembered facts that it can draw upon to solve complex problems — it doesn’t have to reinvent the wheel with each new task. It’s sort of what babies do once they’re about 10 to 12 months old. Infants younger than 10 months commit the classic “a not b error”: A researcher puts a toy under box A ten times consecutively and the baby crawls to box A for a reward every time. But when the researcher puts the toy under box B, in full sight of the infant, it still goes to box A because it’s a executing a learned pattern. Try that with a 1-year-old, and they won’t be tricked. That’s because they are making connections between their memory and what’s unfolding in front of their eyes. They’re using symbolic reasoning. The toy doesn't disappear when it's under box B, you just can't see it. How, exactly, the human brain stores symbolic representations of the world through electrical impulses alone is still hotly debated. But a DNC, researchers say, may serve as a rudimentary analog for this process. As DeepMind researchers wrote in their blog:
“The question of how human memory works is ancient and our understanding still developing. We hope that DNCs provide both a new tool for computer science and a new metaphor for cognitive science and neuroscience: here is a learning machine that, without prior programming, can organise information into connected facts and use those facts to solve problems.”
But let’s not get ahead of ourselves. “The problem with a lot this is, at the end of the day, we know almost nothing about how the brain works,” says Domingos. “No matter what I do I can always make some sort of parallel between what a system is doing and the brain, but it isn’t long before these analogies depart.”
A Long Way to Go
For perspective, building symbolic “knowledge” of London Underground maps and family trees required 512 memory matrix locations. To deal with a flood of dynamic information about the world like even an infant can, researchers say, it would likely require thousands if not millions more memory locations — we still don’t know how the brain does it, so, frankly, this is just speculation. “We have a long way to go before we understand fully the algorithms the human brain uses to support these processes,” Jay McClelland, director of the Center for Mind, Brain and Computation at Stanford University told IEEE Spectrum. DeepMind has constructed a very, very preliminary foundation, and hybrid neural networks could eventually be scaled up to, for example, generate commentaries about the content of videos. These are things humans can do with ease, in any situation. A DNC still needs millions of training examples to accomplish a quite narrow task. Right now, it isn’t clear what practical function a DNC could perform that existing deep learning algorithms can’t already do better. A DNC, in other words, is another clever way to accomplish a task in a field that’s awash in clever solutions. “Adding memory only seems like a big deal in the context of neural networks; for other learning methods, it's trivial,” says Domingos. Still, this demonstration serves as proof that memory, or knowledge, can be a powerful thing.