Predicting Panel Drop-outs with Machine Learning
Abstract: Panel surveys provide a valuable data source for investigating a wide range of substantive research questions and are used extensively in the social sciences and related disciplines. However, panel data quality can be challenged substantially by decreasing sample sizes due to dropouts over time. In its most critical form, panel attrition can be driven by selective nonresponse patterns, eventually leading to a loss in statistical power and to biased estimates. At this point, survey research typically focuses on developing or refining (weighting) methods with which systematic dropouts can be corrected after the data has been collected. Against this background, this project investigates the potential of moving from post- to pre-correction of panel nonresponse from a prediction perspective by predicting dropouts using information from previous waves and machine learning methods. In order to build prediction models which leverage information from multiple waves of a panel survey, the project investigates different longitudinal learning frameworks and builds on data from two German panel studies (GESIS Panel, GSOEP). In this setting, the usage of data-driven classifiers (e.g. random forests, boosting) allows to model complex non-linear and non-additive relationships of nonresponse predictors while focusing specifically on prediction accuracy. Feeding machine learning models with a rich set of data in a longitudinal framework offers a promising avenue for predicting panel nonresponse in advance, which could then be utilized in an effective targeted design to prevent dropouts before they occur.
Project Team: Christoph Kern, Bernd Weiß (GESIS), Jan-Philipp Kolb (GESIS)
Kern, C. (2018). Predicting Panel Drop-Outs with Machine Learning. JSM 2018, Invited Session: Improving Survey Data Quality with Machine Learning Techniques, Vancouver.
Weiß, B., Kolb, J.-P., and Kern, C. (2018). Using Predictive Modeling in Survey Methodology to Identify Panel Nonresponse. JSM 2018, Speed Session: Missing Survey Data: Analysis, Imputation, Design, and Prevention, Vancouver
Kern, C., Klausch, T., and Kreuter, F. (2018). Tree-based Machine Learning Methods for Survey Research. Manuscript submitted for publication.
Kern, C. (2018). Data-driven Prediction of Panel Attrition. AAPOR Conference 2018, Session: Tinkering with Tradition: Using Machine Learning Methods and Big Data to Refine Survey Designs and Improve Survey Participation, Denver.
Kern, C. (2017). Data-driven Prediction of Panel Nonresponse. GESIS Panel Research Colloquium, Mannheim.
Kern, C. (2017). Data-driven Prediction of Panel Nonresponse. ESRA Conference 2017, Session: Different methods, same results? Lisbon.