CI-SUSTAIN: Stan for the Long Run


Andrew Gelman
Higgins Professor of Statistics and Professor of Political Science


Stan is a software package that transforms scientific discovery by allowing scientists to quickly and easily explore, evaluate, and refine rich scientific hypotheses tailored to their particular research question and data collection mechanism. For computational reasons, analyses of data (big or otherwise) have tended to be simple and focused more on the difficulties of manipulating the data than on realistic scientific models. The next generation of Bayesian inference can take scientists beyond this impasse, via sophisticated models that can adjust for differences between sample and population, and between treatment and control groups, to join the benefits of large datasets with the rigor and power of statistical adjustment. Stan further helps educate the next generation of data scientists, with a natural, easy-to-learn and portable modeling language, coupled with robust, practical inference tools. The specific goal of this project is to solidify the Stan code base to enable application, maintenance, and development of the Stan software. Stan is being applied in many corners of the physical, biological, and social sciences, hundreds of at scales ranging from the neutrinos to supernovas, from cellular biology to population ecology, and from human reaction times to social network evolution. In this project the PI aims to document and ruggedize the core infrastructure of Stan to enable it to be used by a wider audience of scientists, to be maintained by a wider group of software developers, and to be extensible to allow for the future development of new scientific applications and statistical algorithms.