Even among those inclined to share, some practical challenges exist. Moving data from one institution to another can be expensive, and it can take weeks to ship hard drives or download the data. Few cancer centers have the resources to invest in powerful enough computers or robust enough networks to support the mammoth datasets. The result, says Dishman, is a “computational bottleneck that stymies progress.”
It’s a bitter pill to swallow for an estimated 1.7 million people in the U.S. who will be diagnosed with cancer this year alone, especially for those with rare cancers. But they may soon have a new option.
Quick Cancer Queries
Intel and OHSU have teamed up through a new, open-source platform called the Collaborative Cancer Cloud (CCC). The initiative enables cancer centers to access and analyze vast amounts of anonymous patient information — from genetic sequences and imaging data to findings in personal health records.
Unlike other open-source initiatives, which ask centers to transfer or retrieve data from one centralized location, the CCC allows researchers to keep their data local. Users access a virtual registry of all this data via the cloud — that is, a network of remote servers hosted on the internet, like the one where your email and selfies are stored.
“With a simple query, you can remotely explore datasets held by institutions that have agreed to share their information,” says Dishman. Every query and answer is wrapped in an encrypted shell before being sent, so it is “completely secure and anonymous.”
In addition, the CCC provides users with cloud-based access to a collection of tools commonly used for genomic analyses, which means centers don’t have to shell out for costly in-house hardware and analytics stacks.
“Rather than ask people to move the data out, we bring the computer power in,” says Dishman.
A Growing Movement
The CCC isn’t the only cloud-based data commons. The National Cancer Institute is developing a platform to house data from the Cancer Genome Atlas — a massive catalog of genomic data from over 11,000 cancer patients. And several institutions have their own clouds where cancer data is kept.
“The problem is these clouds aren’t connected to other clouds. We want to connect them all because, really, we’ll not be able to find the root cause of cancers and the best treatments for those cancers without studying, literally, data from millions of patients,” says Dishman.