In a sleek biochemistry laboratory at the University of Washington, postdoctoral fellow Yang Hsia is watching yellowish goo — the liquefied remains of E. coli — ooze through what looks like a gob of white marshmallow. “This isn’t super exciting,” he says.
While growing proteins in bacteria and then purifying them, using blobby white resin as a filter, doesn’t make for riveting viewing, the end product is extraordinary. Accumulating in Hsia’s resin is a totally artificial protein, unlike anything seen in nature, that might just be the ideal chassis for the first universal flu vaccine.
David Baker, Hsia’s adviser, calls this designer protein a “Death Star.” Imaged on his computer, its structure shows some resemblance to the notorious Star Wars superweapon. Though microscopic, by protein standards it’s enormous: a sphere made out of many interlocking pieces.
“We’ve figured out a way to put these building blocks together at the right angles to form these very complex nanostructures,” Baker explains. He plans to stud the exterior with proteins from a whole suite of flu strains so that the immune system will learn to recognize them and be prepared to fend off future invaders. A single Death Star will carry 20 different strains of the influenza virus.
Baker hopes this collection will cover the entire range of possible influenza mutation combinations. This all-in-one preview of present and future flu strains could replace annual shots: Get the Death Star vaccination, and you’ll already have the requisite antibodies in your bloodstream.
As Baker bets on designer proteins to defeat influenza, others are betting on David Baker.
After revolutionizing the study of proteins — molecules that perform crucial tasks in every cell of every natural organism — Baker is now engineering them from scratch to improve on nature. In late 2017, the Open Philanthropy Project gave his University of Washington Institute for Protein Design more than $10 million to develop the Death Star and support Rosetta, the software platform he conceived in the 1990s to discover how proteins are assembled. Rosetta has allowed Baker’s lab not only to advance basic science and pioneer new kinds of vaccines, but also to create drugs for genetic disorders, biosensors to detect toxins and enzymes to convert waste into biofuels.
His team currently numbers about 80 grad students and postdocs, and Baker is in constant contact with all of them. He challenges their assumptions and tweaks their experiments while maintaining an egalitarian environment in which ideas may come from anyone. He calls his operation a “communal brain.” Over the past quarter-century, this brain has generated nearly 450 scientific papers.
“David is literally creating a new field of chemistry right in front of our eyes,” says Raymond Deshaies, senior vice president for discovery research at the biotech company Amgen and former professor of biology at Caltech. “He’s had one first after another.”
Nature’s Origami
When Baker was studying philosophy at Harvard University, he took a biology class that taught him about the so-called “protein folding problem.” The year was 1983, and scientists were still trying to make sense of an experiment, carried out in the early ’60s by biochemist Christian Anfinsen, that revealed the fundamental building blocks of all life on Earth were more complex than anyone imagined.
The experiment was relatively straightforward. Anfinsen mixed a sample of the protein ribonuclease — which breaks down RNA — with a denaturant, a chemical that deactivated it. Then he allowed the denaturant to evaporate. The protein started to function again as if nothing ever happened.
What made this simple experiment so striking was the fact that the amino acids in protein molecules are folded in three-dimensional forms that make origami look like child’s play. When the denaturant unfolded Anfinsen’s ribonuclease, there were myriad ways it could refold, resulting in structures as different as an origami crane and a paper airplane. Much as the folds determine whether a piece of paper can fly across a room, only one fold pattern would result in functioning ribonuclease. So the puzzle was this: How do proteins “know” how to refold properly?
“Anfinsen showed that the information for both structure and activity resided in the sequence of amino acids,” says University of California, Los Angeles, biochemist David Eisenberg, who has been researching protein folding since the 1960s. “There was a hope that it would be possible to use sequence information to get three-dimensional structural information. Well, that proved much more difficult than anticipated.”
Baker was interested enough in protein folding and other unsolved mysteries of biology to switch majors and apply to grad school. “I’d never worked in a lab before,” he recalls. He had only a vague notion of what biologists did on a daily basis, but he also sensed that the big questions in science, unlike philosophy, could actually be answered.
Grad school plunged Baker into the tediousness and frustrations of benchwork, while also nurturing some of the qualities that would later distinguish him. He pursued his Ph.D. under Randy Schekman, who was studying how molecules move within cells, at the University of California, Berkeley. To aid in this research, students were assigned the task of dismantling living cells to observe their internal molecular traffic. Nearly half a dozen of them, frustrated by the assignment’s difficulty, had given up by the time Baker got the job.
Baker decided to follow his instincts even though it meant going against Schekman’s instructions. Instead of attempting to keep the processes within a cell still functioning as he dissected it under his microscope, Baker concentrated on preserving cell structure. If the cell were a wristwatch, his approach would be equivalent to focusing on the relationship between gears, rather than trying to keep it ticking, while taking it apart.
“He was completely obsessed,” recalls Deshaies, who was his labmate at the time (and one of the students who’d surrendered). Nobody could stop Baker, or dissuade him. He worked for months until he proved his approach was correct: Cell structure drove function, so maintaining its anatomy preserved the internal transportation network. Deshaies believes Baker’s methodological breakthrough was “at the core of Randy’s Nobel Prize,” awarded in 2013 for working out one of the fundamentals of cellular machinery.
But Baker didn’t dwell on his achievement, or cell biology for that matter. By 1989, Ph.D. in hand, he’d headed across the Bay to the University of California, San Francisco, where he switched his focus to structural biology and biochemistry. There he built computer models to study the physical properties of the proteins he worked with at the bench. Anfinsen’s puzzle remained unsolved, and when Baker got his first faculty appointment at the University of Washington, he took up the protein-folding problem full time.
From Baker’s perspective, this progression was perfectly natural: “I was getting to more and more fundamental problems.” Deshaies believes Baker’s tortuous path, from cells to atoms and from test tubes to computers, has been a factor in his success. “He just has greater breadth than most people. And you couldn’t do what he’s done without being somewhat of a polymath.”
Rosetta Milestone
Every summer for more than a decade, scores of protein-folding experts convene at a resort in Washington’s Cascade Mountains for four days of hiking and shop talk. The only subject on the agenda: how to advance the software platform known as Rosetta. They call it Rosettacon.
Rosetta has been the single most important tool in the quest to understand how proteins fold, and to design new proteins based on that knowledge. It is the link between Anfinsen’s ribonuclease experiment and Baker’s Death Star vaccine.
When Baker arrived at the University of Washington in 1993, researchers knew that a protein’s function was determined by its structure, which was determined by the sequence of its amino acids. Just 20 different amino acids were known to provide all the raw ingredients. (Their particular order — specified by DNA — makes one protein fold into, say, a muscle fiber and another fold into a hormone.) Advances in X-ray crystallography, a technique for imaging molecular structure, had provided images of many proteins in all their folded splendor. Sequencing techniques had also improved, benefitting from the Human Genome Project as well as the exponential increase in raw computing power.
“There’s a right time for things,” Baker says in retrospect. “To some extent, it’s just luck and historical circumstance. This was definitely the right time for this field.”
Which is not to say that modeling proteins on a computer was a simple matter of plugging in the data. Proteins fold to their lowest free energy state: All of their amino acids must align in equilibrium. The trouble is that the equilibrium state is just one of hundreds of thousands of options — or millions, if the amino acid sequence is long. That’s far too many possibilities to test one at a time. Nature must have another way of operating, given that folding is almost instantaneous.
Baker’s initial approach was to study what nature was doing. He broke apart proteins to see how individual pieces behaved, and he found that each fragment was fluctuating among many possible structures. “And then folding would occur when they all happened to be in the right geometry at the same time,” he says. Baker designed Rosetta to simulate this dance for any amino acid sequence.
Baker wasn’t alone in trying to predict how proteins fold. In 1994, the protein research community organized a biennial competition called CASP (Critical Assessment of Protein Structure Prediction). Competitors were given the amino acid sequences of proteins and challenged to anticipate how they would fold.
The first two contests were a flop. Structures that competitors number-crunched looked nothing like folded proteins, let alone the specific proteins they were meant to predict. Then everything changed in 1998.
Function Follows Form
That summer, Baker’s team received 20 sequences from CASP, a considerable number of proteins to model. But Baker was optimistic: Rosetta would transform protein-folding prediction from a parlor game into legitimate science.
In addition to incorporating fresh insights from the bench, team members — using a janky collection of computers made of spare parts — found a way to run rough simulations tens of thousands of times to determine which fold combinations were most likely.
They successfully predicted structures for 12 out of the 20 proteins. The predictions were the best yet, but still approximations of actual proteins. In essence, the picture was correct, but blurry.
Improvements followed rapidly, with increased computing power contributing to higher-resolution models, as well as improved ability to predict the folding of longer amino acid chains. One major leap was the 2005 launch of Rosetta@Home, a screensaver that runs Rosetta on hundreds of thousands of networked personal computers whenever they’re not being used by their owners.
Yet the most significant source of progress has been RosettaCommons, the community that has formed around Rosetta. Originating in Baker’s laboratory and growing with the ever-increasing number of University of Washington graduates — as well as their students and colleagues — it is Baker’s communal brain writ large.
Dozens of labs continue to refine the software, adding insights from genetics and methods from machine learning. New ideas and applications are constantly emerging.
The communal brain has answered Anfinsen’s big question — a protein’s specific amino acid alignment creates its unique folding structure — and is now posing even bigger ones.
“I think the protein-folding problem is effectively solved,” Baker says. “We can’t necessarily predict every protein structure accurately, but we understand the principles.
“There are so many things that proteins do in nature: light harvesting, energy storage, motion, computation,” he adds. “Proteins that just evolved by pure, blind chance can do all these amazing things. What happens if you actually design proteins intelligently?”
De Novo Design
Matthew Bick is trying to coax a protein into giving up its sugar habit for a full-blown fentanyl addiction. His computer screen shows a colorful image of ribbons and swirls representing the protein’s molecular structure. A sort of Technicolor Tinkertoy floats near the center, representing the opioid. “You see how it has really good packing?” he asks me, tracing the ribbons with his finger. “The protein kind of envelops the whole fentanyl molecule like a hot dog bun.”
A postdoctoral fellow in Baker’s lab, Bick engineers protein biosensors using Rosetta. The project originated with the U.S. Department of Defense. “Back in 2002, Chechen rebels took a bunch of people hostage, and there was a standoff with the Russian government,” he says. The Russians released a gas, widely believed to contain a fentanyl derivative, that killed more than a hundred people. Since then, the Defense Department has been interested in simple ways to detect fentanyl in the environment in case it’s used for chemical warfare in the future.
Proteins are ideal molecular sensors. In the natural world, they’ve evolved to bind to specific molecules like a lock and key. The body uses this system to identify substances in its environment. Scent is one example; specific volatiles from nutrients and toxins fit into dedicated proteins lining the nose, the first step in alerting the brain to their presence. With protein design, the lock can be engineered to order.
For the fentanyl project, Bick instructed Rosetta to modify a protein with a natural affinity for the sugar xylotetraose. The software generated hundreds of thousands of designs, each representing a modification of the amino acid sequence predicted to envelop fentanyl instead of sugar molecules. An algorithm then selected the best several hundred options, which Bick evaluated by eye, eventually choosing 62 promising candidates. The protein on Bick’s screen was one of his favorites.
“After this, we do the arduous work of testing designs in the lab,” Bick says.
With another image, he reveals his results. All 62 contenders have been grown in yeast cells infused with synthetic genes that spur the yeasts’ own amino acids to produce the foreign proteins. The transgenic yeast cells have been exposed to fentanyl molecules tagged with a fluorescing chemical. By measuring the fluorescence — essentially shining ultraviolet light on the yeast cells to see how many glow with fentanyl — Bick can determine which candidates bind to the opioid with the greatest strength and consistency.
Baker’s lab has already leveraged this research to make a practical environmental sensor. Modified to glow when fentanyl binds to the receptor site, Bick’s customized protein can now be grown in a common plant called thale cress. This transgenic weed can cover terrain where chemical weapons might get deployed, and then glow if the dangerous substances are present, providing an early warning system for soldiers and health workers.
The concept can also be applied to other biohazards. For instance, Bick is now developing a sensor for aflatoxin, a residue of fungus that grows on grain, causing liver cancer when consumed by humans. He wants the sensor to be expressed in the grain itself, letting people know when their food is unsafe.
But he’s going about things differently this time around. Instead of modifying an existing protein, he’s starting from scratch. “That way, we can control a lot of things better than in natural proteins,” he explains. His de novoprotein can be much simpler, and have more predictable behavior, because it doesn’t carry many million years of evolutionary baggage.
For Baker, de novo design represents the summit of his quarter-century quest. The latest advances in Rosetta allow him to work backward from a desired function to an appropriate structure to a suitable amino acid sequence. And he can use any amino acids at all — thousands of options, some already synthesized and others waiting to be designed — not only the 20 that are standard in nature for building proteins.
Without the freedom of de novo protein design, Baker’s Death Star would never have gotten off the ground. His group is now also designing artificial viruses. Like natural viruses, these protein shells can inject genetic material into cells. But instead of infecting you with a pathogen, their imported DNA would patch dangerous inherited mutations. Other projects aim to take on diseases ranging from malaria to Alzheimer’s.
In Baker’s presence, protein design no longer seems so extraordinary. Coming out of a brainstorming session — his third or fourth of the day — he pulls me aside and makes the case that his calling is essentially the destiny of our species.
“All the proteins in the world today are the product of natural selection,” he tells me. “But the current world is quite a bit different than the world in which we evolved. We live much longer, so we have a whole new class of diseases. We put all these nasty chemicals into the environment. We have new needs for capturing energy.
“Novel proteins could solve a lot of the problems that we face today,” he says, already moving to his next meeting. “The goal of protein design is to bring those into existence.”
[This story originally appeared in print as "All in the Fold"]