The proposed project has a broad aim of working with the increasing complexities of survey statistics with decreasing response rate. We focus specifically on non-probability samples (samples of convenience) due to their increasing popularity, but note that these non-probability samples are simply an extreme case of a probability based survey with high non-response, and so our methods could be expected to generalize.
Decisions about coronavirus response are necessarily based on statistical models of prevalence, transmission risks, case fatality rate, projection of future spread of infection, and estimated effects of medical and social interventions. Much of this modeling and inference is being done using the Bayesian framework, an approach to statistics that is well suited to integration of information from different sources and accounting for uncertainty in predictions that can be input into decision analysis.
Statistical methods have had great successes for exploring data, making predictions, and solving problems in a wide range of problems. But in the world of big data, methods need to be scalable, so as to handle larger problems while modeling the real-world problems of messy and nonrepresentative data. The project?s novelties are developments in software and hardware facilitating full-stack integration of Bayesian inference to allow complex and realistic models to be fit to large datasets.
In this project, a set of tools will be built for in-depth analysis of survey data, making use of and extending statistical methods for estimation for small subgroups. Classical methods for surveys are focused on aggregate population-level estimates but we can learn much more using small-area estimation. The goal of this project is to build a user-accessible platform for modeling and visualizing survey data that would give estimates for arbitrary subgroups of the population, along with visualization tools to display estimates of interest.
Stan is a software package that transforms scientific discovery by allowing scientists to quickly and easily explore, evaluate, and refine rich scientific hypotheses tailored to their particular research question and data collection mechanism. For computational reasons, analyses of data (big or otherwise) have tended to be simple and focused more on the difficulties of manipulating the data than on realistic scientific models.
Sign up here to receive our Working Papers Bulletin, featuring work from researchers across all of the social science departments. To submit your own working paper for our next bulletin, please upload it here, or send it to iserp-communication@columbia.edu.