Discussion AREUEA International Meeting 2023

Discussion
“Big Data for Housing and Their
Interaction with Market Dynamics”
(Jieun Lee & Kwan Ok Lee)

Thies Lindenthal
htl24@cam.ac.uk
https://www.lindenthal.eu

21. July 2023

## Better predictions (1) Textual descriptions contain information that traditional hedonic attributes cannot capture

* Shen, L. and S. Ross (2021). Information Value of Property Descriptions: A Machine Learning Approach. Journal of Urban Economics
			- This paper employs machine learning to quantify the value of “soft” information contained in real estate property descriptions. Textual descriptions contain information that traditional hedonic attributes cannot capture. A one standard deviation increase in the uniqueness of a property based on this “soft” information leads to a 15% increase in property sale price in a hedonic price model and a 10% increase in a repeat sales price model. The effects in the hedonic model appear to arise through two channels: the unobserved quality of the housing unit, and the market power of the housing unit relative to competing properties. The effects in the repeat sales model appear to be driven entirely by the market power of the unit. Further, an annual hedonic price index ignoring our measure of unobserved quality overstates real estate prices by between 10% to 23% and mistimes the stabilization of housing prices following the Great Recession. Similar, but smaller effects, are observed for the repeat sales price index.

## Better predictions (2) Text is found to decrease pricing error by more than 25%

* Nowak, A. and Smith, P. (2017). Textual Analysis in Real Estate. Journal of Applied Econometrics 32: 896–918.
			- This paper incorporates text data from MLS listings into a hedonic pricing model. We show that the comments section of the MLS, which is populated by real estate agents who arguably have the most local market knowledge and know what homebuyers value, provides information that improves the performance of both in-sample and out-of-sample pricing estimates. Text is found to decrease pricing error by more than 25%. Information from text is incorporated into a linear model using a tokenization approach. By doing so, the implicit prices for various words and phrases are estimated. The estimation focuses on simultaneous variable selection and estimation for linear models in the presence of a large number of variables using a penalized regression. The LASSO procedure and variants are shown to outperform least-squares in out-of-sample testing. Copyright © 2016 John Wiley & Sons, Ltd.

Discussion“Big Data for Housing and TheirInteraction with Market Dynamics”(Jieun Lee & Kwan Ok Lee)

Discussion
“Big Data for Housing and Their
Interaction with Market Dynamics”
(Jieun Lee & Kwan Ok Lee)