UBC Statistics Department Colloquium Series: A Debiased Machine Learning Single-Imputation Framework for Item Nonresponse in Surveys
Machine learning methods are now increasingly studied and used in National Statistical Offices, in particular to handle item nonresponse, where some survey respondents answer certain questions but leave others missing. In most surveys, item nonresponse affects key study variables, and imputation is routinely used to handle the resulting missing data. Standard parametric imputation methods can support rigorous inference when their modeling assumptions are approximately correct. However, when the imputation model is misspecified, the resulting inferences may be potentially misleading. Machine learning offers a flexible alternative by learning complex relationships between variables from the data, which can reduce the risk of misspecification. At the same time, this flexibility introduces new challenges for survey inference, since modern learning algorithms may converge more slowly than classical parametric models and may not automatically deliver valid uncertainty quantification. In this talk, I will present a survey sampling extension of the double/debiased machine learning framework of Chernozhukov et al. (2018). The proposed approach combines machine learning-based imputation with design-based survey weighting and an orthogonalized estimating strategy, leading to root-$n$ consistent and asymptotically normal estimation of population means under realistic conditions. We also develop a consistent variance estimator, yielding asymptotically valid confidence intervals while allowing the use of a wide range of machine learning algorithms. I will briefly discuss aggregation procedures and conclude with simulation results illustrating the performance of the proposed methodology.
This talk is part of the UBC Statistics Colloquium Series, which features broad and accessible seminars throughout the term.

