Choosing experiments to accelerate collective discovery
Andrey Rzhetsky a,b,c,1, Jacob G. Foster d, Ian T. Foster b,e, and James A. Evans b,f,1
aDepartments of Medicine and Human Genetics, University of Chicago, Chicago, IL 60637;
bComputation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL 60637;
cInstitute of Genomic and Systems Biology, University of Chicago, Chicago, IL 60637;
dDepartment of Sociology, University of California, Los Angeles, CA 90095;
eMathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60637;
fDepartment of Sociology, University of Chicago, Chicago, IL 60637
Edited by Yu Xie, University of Michigan, Ann Arbor, MI, and approved September 8, 2015 (received for review May 18, 2015)
Significance
Scientists perform a tiny subset of all possible experiments. What characterizes the experiments they choose? And what are the consequences of those choices for the pace of scientific discovery? We model scientific knowledge as a network and science as a sequence of experiments designed to gradually uncover it. By analyzing millions of biomedical articles published over 30 y, we find that biomedical scientists pursue conservative research strategies exploring the local neighborhood of central, important molecules. Although such strategies probably serve scientific careers, we show that they slow scientific advance, especially in mature fields, where more risk and less redundant experimentation would accelerate discovery of the network. We also consider institutional arrangements that could help science pursue these more efficient strategies.
Abstract
A scientist’s choice of research problem affects his or her personal career trajectory. Scientists’ combined choices affect the direction and efficiency of scientific discovery as a whole. In this paper, we infer preferences that shape problem selection from patterns of published findings and then quantify their efficiency. We represent research problems as links between scientific entities in a knowledge network. We then build a generative model of discovery informed by qualitative research on scientific problem selection. We map salient features from this literature to key network properties: an entity’s importance corresponds to its degree centrality, and a problem’s difficulty corresponds to the network distance it spans. Drawing on millions of papers and patents published over 30 years, we use this model to infer the typical research strategy used to explore chemical relationships in biomedicine. This strategy generates conservative research choices focused on building up knowledge around important molecules. These choices become more conservative over time. The observed strategy is efficient for initial exploration of the network and supports scientific careers that require steady output, but is inefficient for science as a whole. Through supercomputer experiments on a sample of the network, we study thousands of alternatives and identify strategies much more efficient at exploring mature knowledge networks. We find that increased risk-taking and the publication of experimental failures would substantially improve the speed of discovery. We consider institutional shifts in grant making, evaluation, and publication that would help realize these efficiencies.
complex networks computational biology science of science innovation sociology of science
Footnotes
1To whom correspondence may be addressed. Email: arzhetsky{at}uchicago.edu or jevans{at}uchicago.edu.
Author contributions: A.R., J.G.F., and J.A.E. designed research; A.R., J.G.F., and J.A.E. analyzed data; and A.R., J.G.F., I.T.F., and J.A.E. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1509757112/-/DCSupplemental.
Freely available online through the PNAS open access option.
FREE PDF GRATIS: PNAS