Collaborative Research: PPoSS: Planning: Scalable Systems for Probabilistic Programming


Andrew Gelman
Higgins Professor of Statistics and Professor of Political Science


Statistical methods have had great successes for exploring data, making predictions, and solving problems in a wide range of problems. But in the world of big data, methods need to be scalable, so as to handle larger problems while modeling the real-world problems of messy and nonrepresentative data. The project?s novelties are developments in software and hardware facilitating full-stack integration of Bayesian inference to allow complex and realistic models to be fit to large datasets. The project's impacts are in many areas of pure and applied science, including fields as diverse as epidemiology, genetics, and political science, which are challenging because they are dense in parameters rather than in data. Examples include models for disease progression and drug development, decision making under uncertainty, and trends in public opinion.

The project is exploring probabilistic programming, including hardware, high-performance computing, programming languages and compilers, and algorithms. The ultimate goal is to develop the tools necessary for an efficient, and scalable Bayesian workflow, building on the existing success of the open-source probabilistic programming language Stan. The team of researchers on this project are working on explorations of algorithms (model validation for approximate inference), programming languages and compilers (automating of approximate algorithms and advanced performance profiling), systems (probabilistic programming for streaming data), high-performance computing (parallel processing and GPUs), and hardware (exploring domain-specific hardware for Bayesian computation).