Solving Difficult Bayesian Computation Problems in Education Research with Stan


Andrew Gelman
Higgins Professor of Statistics and Professor of Political Science

Some statistical models used in education research are complex. This complexity arises in part because the data structures that underlie these statistical models involve multiple nested (i.e., cross-classified multilevel models) and non-nested groupings (i.e., partially-nested designs). Another source of complexity in these models results from the fact that key variables, such as student achievement, can be measured only indirectly and are represented in the model by latent variables.

The purpose of this research project is to modify a current statistical tool, Stan, which is used to support the statistical inference of complex models into a general, user-friendly, and efficient tool for use in education research. Data sets, such as the Education Longitudinal Study (ELS) and Trends in International Mathematics and Science Study (TIMMS), will be used for testing and developing new improvements to the current software and for generating dissemination materials (e.g., templates and tutorials). The primary product of the research will be improvements to Stan (, an open source Bayesian statistical programming language developed by the research team with prior research support from IES (Practical Solutions for Missing Data and Imputation - R305D090006) The goal is to make Stan accessible to education researchers in a number of ways, such as providing templates and tutorials to make these methods usable by a wider range of researchers including those who might have minimal statistical expertise. The team will develop a web interface for educators and researchers through which they can fit, evaluate and draw inferences from preconfigured and custom models, and draw inferences from their own data without having to install software or understand the internal operations of Stan.