Improving prediction accuracy in small area models by means of clusterwise regression

Shonosuke Sugasawa Co-Author
 
Raffaele Mattera Co-Author
 
Paolo Maranzano Speaker
University of Milano-Bicocca
 
Tuesday, Aug 5: 2:25 PM - 2:45 PM
Topic-Contributed Paper Session 
Music City Center 
In the context of small-area estimation (SAE) models on data from sample surveys, a typical approach is to use auxiliary information to enhance the precision of estimates obtained from direct total or mean estimators. Direct estimates and auxiliary information are integrated through regression models that can include both fixed and random effects, thus allowing for the development of mixed-effects models with potentially spatially and temporally structured components (Morales et al., 2021). These models typically assume that the regression coefficients remain constant over time or space. However, As observed by Wang et al. (2023), this strategy may prove to be inadequate in light of the possibility that the relationships between variables may vary in space, thereby giving rise to the presence of spatial heterogeneity (Zhu & Turner, 2022). The objective of this paper is to present an innovative approach to simultaneously address the issues of heterogeneity and spatial dependence in small-area estimation (SAE) models. This strategy aims to enhance the predictive capabilities of SAE models and provide more accurate estimates of the sample variables of interest. The proposed methodology integrates three previously proposed methodologies: (1) Sugasawa and Murakami (2021) proposal of spatially-clustered regression models, in which regression coefficients can vary according to a spatial cluster structure determined endogenously through penalized likelihood; (2) Wang et al. (2023) proposal, in which in a context of spatial penalized least squares, location-specific weights are employed to estimate local regression coefficients and clustering membership; (3) Cerqueti et al. (2024) proposal which extended the spatially-clustered linear regression model to encompass the leading spatial econometric models (e.g., SAR and Durbin model). In particular, the proposal entails the estimation of linear mixed effects models belonging to the Fay-Herriot family with clusterwise spatially-varying coefficients, wherein areas are merged through a spatially-penalized likelihood. The proposed methodology is applied to data on Italian farms provided by the Farm Accountancy Data Network (FADN) survey of the European Union (Baldoni et al., 2017). The dataset consists of a sample of thousands of farms across the country, the economic, production, technological, energy, and environmental impact information of which is collected annually. In particular, the application involves estimating the carbon footprint of farms in the Po Valley in recent years (Carillo et al., 2024) supported by auxiliary information from the 2020 national agricultural census.

References
Baldoni, E., Coderoni, S., & Esposti, R. (2017). The productivity and environment nexus with farm-level data. The Case of Carbon Footprint in Lombardy FADN farms. Bio-based and Applied Economics, 6(2), 119-137.
Carillo, F., Maranzano, P., Marcis, L., Pagliarella, M. C., & Salvatore, R. (2024). The spatio-temporal Fay-Herriot model using the state-space method: an application to Italian Lombard agrarian sub-regions. In Book of Short Papers - 2nd Italian Conference on Economic Statistics (ICES 2024) - Statistical Analysis of Complex Economic Data: Recent Developments and Applications (pp. 66-69). Casa Editrice Bonechi, Via Scipione Ammirato, 100 - 50136 Firenze (FI), [email protected], www.bonechi.it.
Cerqueti, R., Maranzano, P., & Mattera, R. (2024). Spatially-Clustered Spatial Autoregressive Models with Application to Agricultural Market Concentration in Europe. Journal of Agricultural, Biological and Environmental Statistics, DOI: 10.1007/s13253-024-00672-4
Morales, D., Esteban, M. D., Pérez, A., & Hobza, T. (2021). A course on small area estimation and mixed models. Methods, theory and applications in R.
Sugasawa, S., & Murakami, D. (2021). Spatially clustered regression. Spatial Statistics, 44, 100525. https://doi.org/https://doi.org/10.1016/j.spasta.2021.100525
Wang, X., Zhu, Z., & Zhang, H. H. (2023). Spatial heterogeneity automatic detection and estimation. Computational Statistics & Data Analysis, 180, 107667. https://doi.org/https://doi.org/10.1016/j.csda.2022.107667
Zhu, A. X., & Turner, M. (2022). How is the Third Law of Geography different? Annals of GIS, 28(1), 57-67. https://doi.org/10.1080/19475683.2022.2026467