Project Tycho: Unlocking 125 Years of Data

Posted by Dr. Donald Burke on November 27, 2013 
The following post is a guest post by: 
Donald S. Burke, MD: Dean of the University of Pittsburgh Graduate School of Public Health, UPMC-Jonas Salk Chair in Global Health, Associate Vice Chancellor for Global Health, Director of the Center for Vaccine Research, and Distinguished University Professor of Health Science and Policy  &
Wilbert Van Panhuis, MD, PhD: Lead Investigator of Project Tycho, and Assistant Professor of Epidemiology 
All too often, valuable data are painstakingly collected by public health agencies, reported in a weekly or annual report, and then filed and forgotten. We created “Project TychoTM : DATA FOR HEALTH”,  a large data set that includes all weekly surveillance reports of nationally notifiable diseases for all U.S. cities and states published since 1888, and we have made these data publicly available (Level 1 and Level 2 data)The data set consists of 87,950,807 reported individual cases, each localized in space and time. In a NEJM paper entitled “Contagious Diseases in the United States from 1888 to the Present” we analyzed data on eight important vaccine-preventable contagious diseases and found that at least 100 million cases of illness, many serious, have been prevented by immunization programs in the United States.
Why “Project Tycho”? Four centuries ago, the Danish astronomer Tycho Brahe meticulously recorded the celestial movement patterns of the planets; Johanes Kepler then relied on Tycho’s unique data to derive the laws of planetary motion. Contagious diseases (eg polio, measles, and whooping cough)  also exhibit dynamic waxing and waning patterns, as influenced by birth rates, population immunity, weather, and other factors. In our efforts to build realistic computational models of epidemic patterns and control strategies (with support from the NIH and the Gates Foundation), we found ourselves continually returning to the historical record to retrieve key data sets for analysis.  We realized that this historical record of disease data is a rich one, but it heretofore existed only as bound hard copy publications, or more recently as online PDFs, but nowhere in a convenient digital, computable, downloadable format.
Project Tycho™ involved rescue of 6300 weekly reports for 56 diseases;  data entry of 200 million keystrokes;  localization of case counts to 50 states and  thousands of cities;  and merger and reconciliation of 37,000 tables. Throughout, every data point retained its link to the corresponding primary source document.  An easy-to-use front end web page enables free exploration and downloading of US disease data.
The philosopher Kierkegaard once said that “We live forward, but understand backward.”  We created Project Tycho™ to facilitate understanding of epidemic dynamics in the historical past, and this understanding will in turn help public health officials to prevent epidemics and save lives in the future.  We believe that open access to public health surveillance data sets - in computable form - should become a world-wide norm.
video about Project Tycho™ and an animation demonstrating the use of Project Tycho™ data are available for viewing and an article on Project Tycho™ in the New England Journal of Medicine is now available in the November 28th version of the journal.