less than 1 minute read

  • Leow, K. and T. Lindenthal. “More Is More: Leveraging Missing Data in Commercial Real Estate with Machine Learning”
    • Missing data are pervasive in commercial real estate research, yet common practice remains to discard incomplete observations or fill gaps with crude imputation rules. We show that doing so can meaningfully distort inference and reduce predictive accuracy. Using detailed asset-level data from the NCREIF Property Index, we document substantial and systematic missingness in key variables, suggesting that data are unlikely to be missing at random. We then demonstrate that modern machine-learning methods, specifically the sparsity-aware XGBoost algorithm, can exploit incomplete observations without requiring imputation, yielding markedly higher out-of-sample predictive performance than models restricted to complete cases. Moreover, we find that incorporating incomplete data can change the apparent marginal effects and relative importance of standard covariates, implying that conclusions drawn from ‘clean’ subsamples may be misleading. Our results highlight that, in commercial real estate applications, more data—even if partially missing—can be more informative than smaller, perfectly complete samples.

Updated: