Practical Solutions for Missing Data

You are here :: Home » Research Initiatives » Research Grants » Practical Solutions for Missing Data

Practical Solutions for Missing Data

by Andrew Gelman

Missing data are ubiquitous in education research studies.  Given the considerable time and expense of performing surveys and intervention studies, the desire to make inferences that are generalizable, and the quest for greater statistical power, when researchers discard observations with missing values, as is commonly done, they waste precious resources and leave the door open for systematic errors.  A large literature discusses the shortcoming of simple missing data approaches such as complete case analysis and inclusion of indicators for missing data.  However these practices are still widespread, with unknown consequences.
 
As a potential remedy for this problem, multiple imputation is becoming an increasingly widely used approach to handling missing data.  However there are outstanding research questions regarding the most reliable methods or when it even makes sense to invest in this technique over potentially simpler solutions. Moreover, there are barriers to widespread adoption of multiple imputation. Many researchers have a legitimate reluctance to use an algorithm whose steps and outcomes they don't understand.  And more technically savvy researchers may be avoiding existing software because they want to fit more complicated models than are available with existing software.
 
The proposed work develops, extends, and test strategies for (multiply) imputing missing data.  The novel features of our software range from statistical innovations with regard to modeling and diagnostics, to innovations in the user-interface and computational efficiency.  We also propose development of multiple imputation diagnostics that have the potential to troubleshoot imputations from a variety of perspectives at the touch of a button.   
 
Our goals can be broadly defined as (1) investigating the properties of imputation models and algorithms, (2) developing diagnostics to reveal problems with imputations in real time, (3) developing models and algorithms that are more likely to create appropriate imputations, (4) creating software that is reliable and usable by non-statisticians yet can accommodate the needs of more sophisticated modelers as well, and (5) testing our diagnostics, models, and algorithms in our ongoing applied research.  An important part of the tests of our software will be a comparison of the performance of multiple imputation with simpler missing data strategies to get a better handle on when it is necessary to make the jump to multiple imputation.
 
As part of this process we will engage education researchers to assess the types of data structures and missing data patterns they most commonly encounter.  We will also use these researchers as resources to help guide our software development so that the resulting package is user-friendly and meets a diverse set of needs.


Funded by Department of Education »

ISERP

Institute for Social and Economic Research and Policy

Columbia University
International Affairs Building

420 West 118th Street
8th Floor, Mail Code 3355
New York, New York 10027

Tel. 212-854-3081
Fax 212-854-8925
iserp@columbia.edu

www.iserp.columbia.edu