ILIA
AZIZI
about
publications
teaching
projects
packages
contact
SEMF: Supervised Expectation-Maximization Framework for Predicting Intervals
This work introduces the Supervised Expectation-Maximization Framework (SEMF), a versatile and model-agnostic framework that generates prediction intervals for datasets with complete or missing data. SEMF extends the Expectation-Maximization (EM) algorithm, traditionally used in unsupervised learning, to a supervised context, enabling it to extract latent representations for uncertainty estimation. The framework demonstrates robustness through extensive empirical evaluation across 11 tabular datasets, achieving—in some cases—narrower normalized prediction intervals and higher coverage than traditional quantile regression methods. Furthermore, SEMF integrates seamlessly with existing machine learning algorithms, such as gradient-boosted trees and neural networks, exemplifying its usefulness for real-world applications. The experimental results highlight SEMF’s potential to advance state-of-the-art techniques in uncertainty quantification.
May 28, 2024
Ilia Azizi, Marc-Olivier Boldi, Valérie Chavez-Demoulin
Improving Real Estate Rental Estimations with Visual Data
Multi-modal data is widely available for online real estate listings. Announcements can contain various forms of data, including visual data and unstructured textual descriptions. Nonetheless, many traditional real estate pricing models rely solely on well-structured tabular features. This work investigates whether it is possible to improve the performance of the pricing model using additional unstructured data, namely images of the property and satellite images. We compare four models based on the type of input data they use: (1) tabular data only, (2) tabular data and property images, (3) tabular data and satellite images, and (4) tabular data and combination of property and satellite images. In a supervised context, branches of dedicated neural networks for each data type are fused (concatenated) to predict log rental prices. The novel dataset devised for the study (SRED) consists of 11,105 flat rentals advertised over the internet in Switzerland. The results reveal that using all three sources of data generally outperforms machine learning models built on only the tabular information. The findings pave the way for further research on integrating other non-structured inputs, for instance, the textual descriptions of properties.
Sep 9, 2022
Ilia Azizi, Iegor Rudnytskyi
No matching items