Eric Dishman, a former Intel executive now at the National Institutes of Health, was a 19-year-old college sophomore when he was diagnosed with a rare form of kidney cancer. Over the course of the next 23 years, he would receive 62 different kinds of chemotherapy, immunotherapy and radiation. Some slowed the tumor’s growth, but never for long. The cancer spread from his left kidney to right kidney.
Just when it seemed Dishman had run out of options, a chance encounter in 2012 with a scientist working for a now-defunct genome-testing company presented an opportunity he couldn’t refuse. He had his cancerous tissue sequenced, a process that would compare his cancer’s mutated DNA with a healthy patient’s genome. This would let doctors look for genetic mutations and other abnormalities that support cancer growth, and to use that information to devise a treatment strategy. For example, changes in certain genes could indicate that his cancer was more likely to respond to a particular drug, while other mutations might predict little benefit from a specific therapy. Once the doctors sequenced his tumor, all he had to do was wait. And wait.
Dishman says he was “literally at death’s door,” when he got the call from his doctor. It had taken seven months for a team of oncologists, computer scientists and data crunchers to analyze Dishman’s genetic data and pinpoint a drug — for pancreatic cancer — that would target the unique features of his cancer. This experimental drug homes in on the abnormal gene suspected to cause Dishman’s disease. Within three months of starting treatment, he was cancer-free and eligible for the kidney transplant that ultimately saved his life.
Inspired by the treatment, Dishman is now on a campaign to make this kind of tailored cancer care available to more patients. And, ironically, this individually focused approach likely hinges on the efforts of crowds.
Data-Driven Treatments
The approach is already routine for some cancer patients, such as women and men with breast cancer tumors that have high levels of a protein called HER2, or lung cancer tumors with mutations in the EGFR gene. These people can often benefit from drugs that target specific cancer-causing aberrations rather than attacking the body as a whole.
Most patients’ treatment trajectories are not as straightforward. In theory, insights into the genetic underpinnings of cancer, made possible through genomic sequencing, will allow people with even the hardest-to-treat diagnoses to benefit from individualized treatment approaches. Currently, only about 2 percent of cancer patients have their genomes sequenced. These lucky few are most often treated at elite cancer institutions as part of a clinical trial. However, doctors are increasingly making use of the new technology as it becomes exponentially cheaper and faster.
But devising treatment strategies based on insights from sequencing data, as was done for Dishman, requires “monumental shifts in how we share knowledge,” says Brian Druker, director of Oregon Health & Science University’s (OHSU) Knight Cancer Institute.
First, it requires data, and lots of it. That’s the only way to pinpoint the mutations that cause cancers and fuel their growth. “Something that occurs in 1 in 5,000 people seems like a fluke. You really need a dataset of 500,000 or a million people to start seeing patterns,” Druker says.
And second, it requires enormous computing power. Sequencing a single genome yields a terabyte or more of data; Dishman’s kidney tumor yielded 5 terabytes.
Strength in Numbers
Druker and a growing number of scientists believe that amassing and deciphering this torrent of data requires the same open-source ethos that computer programmers have embraced to revolutionize software development. This approach makes the source code of a computer program openly available, and any improvements or modifications to the code are publicly shared.
It could work with cancer, too. “Open source means that, rather than sharing code, scientists and clinicians share data and build upon each other’s knowledge,” says John Wilbanks of Sage Bionetworks, a non-profit biomedical research organization in Seattle that supports open science projects. While few can dispute the benefits of accessibility — after all, scientists depend on the past research of others — Wilbanks says there’s a “prevailing data-hoarding culture.”
Many scientists remain protective of career-advancing findings or intellectual property that could be commercialized. Others cite patient privacy concerns, particularly given the recent spate of data breaches within health care organizations. And data detached from names can still sometimes be used to identify supposedly anonymous patients.
Even among those inclined to share, some practical challenges exist. Moving data from one institution to another can be expensive, and it can take weeks to ship hard drives or download the data. Few cancer centers have the resources to invest in powerful enough computers or robust enough networks to support the mammoth datasets. The result, says Dishman, is a “computational bottleneck that stymies progress.”
It’s a bitter pill to swallow for an estimated 1.7 million people in the U.S. who will be diagnosed with cancer this year alone, especially for those with rare cancers. But they may soon have a new option.
Quick Cancer Queries
Intel and OHSU have teamed up through a new, open-source platform called the Collaborative Cancer Cloud (CCC). The initiative enables cancer centers to access and analyze vast amounts of anonymous patient information — from genetic sequences and imaging data to findings in personal health records.
Unlike other open-source initiatives, which ask centers to transfer or retrieve data from one centralized location, the CCC allows researchers to keep their data local. Users access a virtual registry of all this data via the cloud — that is, a network of remote servers hosted on the internet, like the one where your email and selfies are stored.
“With a simple query, you can remotely explore datasets held by institutions that have agreed to share their information,” says Dishman. Every query and answer is wrapped in an encrypted shell before being sent, so it is “completely secure and anonymous.”
In addition, the CCC provides users with cloud-based access to a collection of tools commonly used for genomic analyses, which means centers don’t have to shell out for costly in-house hardware and analytics stacks.
"Rather than ask people to move the data out, we bring the computer power in,” says Dishman.
A Growing Movement
The CCC isn’t the only cloud-based data commons. The National Cancer Institute is developing a platform to house data from the Cancer Genome Atlas — a massive catalog of genomic data from over 11,000 cancer patients. And several institutions have their own clouds where cancer data is kept.
“The problem is these clouds aren’t connected to other clouds. We want to connect them all because, really, we’ll not be able to find the root cause of cancers and the best treatments for those cancers without studying, literally, data from millions of patients,” says Dishman.
So far, in addition to OHSU, the Dana-Farber Cancer Institute in Boston and the Ontario Institute for Cancer Research in Toronto have joined the CCC, and Dishman says “dozens of others” have expressed interest.
Most cancer centers already have the necessary computing capabilities. “If they don’t, it’s as simple as downloading the CCC tools,” says Dishman. And while he expects many will run CCC on Intel servers, it’s not a requirement. “You don’t have to buy our products to be part of CCC,” says Dishman. “Because it’s open source, it can just as easily run on other computer architectures.”
Druker says doctors can tap the CCC to compare treatments and outcomes of similar patients in order to make the best-informed decisions for the patient under their care.
“The idea is, you can blast a virtual query to sites around the world, who together have insight from a million other patients’ data, and ask, ‘Are there any patients that look like the patient in front of me on a genetic level?’ ” says Druker. “ ‘And what treatments worked for them?’ ” Theoretically, the system would automatically return de-identified information about similar patients.
Today, when Druker wants to gain insight from patient data beyond his own institution, he must do so manually, by phone or email. It’s a painstaking process that can take weeks or months. Though the CCC has just launched, its goal is to make this happen in less than a day by 2020 as more cancer centers join and share data.
“You get sequenced in the morning,” says Druker. “Your data is then compared against millions of other patients. By the end of the day, your doctor can say, ‘Yes, we have found the treatment for you and the data to support that choice.’
“You can’t tell a patient to be patient. They need treatments today,” he added.
[This article originally appeared in print as "Fighting Cancer with Data."]