Publicly Available Data and Cancer Research: A Perfect Match

Without a doubt, one of the best ways to advance science and medical care is to make results from cutting-edge research publicly available and widespread, even before it’s published in a scientific journal.  The data should also have all of the necessary metadata (information about how the experiment was performed and on what samples) so that others can reproduce the work and analyze it in the appropriate context. 

This principle was demonstrated in the Human Genome Project, where researchers built upon the work of others to create a well-annotated collection of data resources that the scientific community has used (and used, and used!) to study the genetic underpinnings of many diseases and conditions. 

The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is using a similar strategy to advance our understanding of cancer. CPTAC researchers analyze the proteins in tumors to learn how they may contribute to the disease. (Learn more about proteomics and how it’s helping to advance cancer research in this video tutorial.)

Recently, CPTAC released high-resolution proteomics data for the most common type of ovarian cancer, serous epithelial ovarian cancer. The same samples used in the CPTAC analysis had already undergone comprehensive genomic analysis by researchers from The Cancer Genome Atlas (TCGA), a program supported by NCI and the National Human Genome Research Institute. The CPTAC dataset that was released included an analysis of the protein content of ovarian cancer cells (a.k.a., its proteome), and these data were integrated with the TCGA genomic analysis.

Researchers from the Pacific Northwestern National Lab and Johns Hopkins University worked collaboratively to produce this comprehensive dataset. The teams analyzed 174 TCGA samples, 32 of which were analyzed by both of the research groups.  The datasets and the corresponding metadata are publicly available on the CPTAC Data Portal.

This latest release of proteomics data from NCI provides researchers with numerous scientific opportunities. For example, there are limited treatment options for patients with advanced forms of ovarian cancer, and these data can be used to analyze how changes that occur at the gene and protein levels interact to drive this cancer and possibly identify new molecular “targets” for therapy. 

The ovarian dataset is the third large-scale data release by CPTAC investigators of the protein content of tumors that have also undergone genomic analysis by TCGA (colorectal and breast cancer datasets were released in 2013 and 2014, respectively).  We hope that making these datasets publicly available will lead to new ways to prevent and treat these and other cancers.

For more information about this data set and the research work of CPTAC visit