Improving Real Estate Rental Estimations with Visual Data

Deep learning
Multi-modal learning

Ilia Azizi

Iegor Rudnytskyi


September 9, 2022

Multi-modal data is widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is possible to improve the performance of the pricing model using additional unstructured data, namely images of the property and satellite images. We compare four models based on the type of input data they use: (1) tabular data only, (2) tabular data and property images, (3) tabular data and satellite images, and (4) tabular data and combination of property and satellite images. In a supervised context, branches of dedicated neural networks for each data type are fused (concatenated) to predict log rental prices. The novel dataset devised for the study (SRED) consists of 11,105 flat rentals advertised over the internet in Switzerland. The results reveal that using all three sources of data generally outperforms machine learning models built on only the tabular information. The findings pave the way for further research on integrating other non-structured inputs, for instance, the textual descriptions of properties.