Discussion
“Big Data for Housing and Their
Interaction with Market Dynamics”
(Jieun Lee & Kwan Ok Lee)
Thies Lindenthal
htl24@cam.ac.uk
https://www.lindenthal.eu
21. July 2023
## Text analysis in real estate Research
Successful research: Better indices!
* Nowak, A. and Smith, P. (2020). Quality-adjusted House Price Indexes. American Economic Review: Insights, 2: 339–356. - The constant-quality assumption in repeat-sales house price indexes (HPIs) introduces a significant time-varying attribute bias. The direction, magnitude, and source of the bias varies throughout the market cycle and across metropolitan statistical areas (MSAs). We mitigate the bias using a data-driven textual analysis approach that identifies and includes salient text from real estate agent remarks in the repeat-sales estimation.
Absent the text, MSA-level HPIs are biased downwards bias much as 7% during the financial crisis and upwards by as much as 20% after the crisis
. The geographic concentration of the bias magnifies its effect on local HPIs.
## Text analysis of listings
Estate agents’ remarks used to address spatial and temporal heterogeneity
* Nowak, A., Price, B., and Smith, P. (2020). Real Estate Dictionaries Across Space and Time. Journal of Real Estate Finance & Economics, 62: 139-163. - Leveraging high-dimensional variable selection methods, we show the textual information provided in real estate agents’ remarks about a property can be used to address spatial and temporal heterogeneity in housing markets.
Including the textual information in the pricing model decreases in-sample prediction errors by as much as 18.7% at the MSA level and 39.1% at the zip code level
. These results are robust to transforming the raw text using a real estate specific word list, the choice of n-grams, word stemming, and heteroscedasticity in the hedonic and repeat-sales models. These findings suggest the raw text in the remarks can be included directly in predictive pricing models.
## Myth busting
Agents do not sell their homes at a premium!
* Liu, C., Nowak, A. and Smith, P. (2020). Asymmetric or Incomplete Information About Asset Values? Review of Financial Studies, 33: 2898-2936. - We provide a new framework for using text as data in empirical models. The framework identifies salient information in unstructured text that can control for multidimensional heterogeneity among assets. We demonstrate the efficacy of the framework by re-examining principal-agent problems in residential real estate markets.
We show that the agent-owned premiums reported in the extant literature dissipate when the salient textual information is included.
The results suggest the previously reported agent-owned premiums suffer from an omitted variable bias, which prior studies incorrectly ascribe to market distortions associated with asymmetric information.
## Better predictions (1)
Textual descriptions contain information that traditional hedonic attributes cannot capture
* Shen, L. and S. Ross (2021). Information Value of Property Descriptions: A Machine Learning Approach. Journal of Urban Economics - This paper employs machine learning to quantify the value of “soft” information contained in real estate property descriptions.
Textual descriptions contain information that traditional hedonic attributes cannot capture.
A one standard deviation increase in the uniqueness of a property based on this “soft” information leads to a 15% increase in property sale price in a hedonic price model and a 10% increase in a repeat sales price model. The effects in the hedonic model appear to arise through two channels: the unobserved quality of the housing unit, and the market power of the housing unit relative to competing properties. The effects in the repeat sales model appear to be driven entirely by the market power of the unit. Further, an annual hedonic price index ignoring our measure of unobserved quality overstates real estate prices by between 10% to 23% and mistimes the stabilization of housing prices following the Great Recession. Similar, but smaller effects, are observed for the repeat sales price index.
## Better predictions (2)
Text is found to decrease pricing error by more than 25%
* Nowak, A. and Smith, P. (2017). Textual Analysis in Real Estate. Journal of Applied Econometrics 32: 896–918. - This paper incorporates text data from MLS listings into a hedonic pricing model. We show that the comments section of the MLS, which is populated by real estate agents who arguably have the most local market knowledge and know what homebuyers value, provides information that improves the performance of both in-sample and out-of-sample pricing estimates.
Text is found to decrease pricing error by more than 25%
. Information from text is incorporated into a linear model using a tokenization approach. By doing so, the implicit prices for various words and phrases are estimated. The estimation focuses on simultaneous variable selection and estimation for linear models in the presence of a large number of variables using a penalized regression. The LASSO procedure and variants are shown to outperform least-squares in out-of-sample testing. Copyright © 2016 John Wiley & Sons, Ltd.
## Large Language Models
Earlier work is successful—but text analysis has progressed since.
* Can large language models such as ChatGPT be used to gain more insights about properties?