Print Close

Neural networks for spatially correlated data

Presented During: Innovations at the Interface of Statistics, AI, and Real-World Evidence

Abhirup Datta Speaker
Johns Hopkins University

Monday, Aug 4: 2:05 PM - 2:30 PM
Invited Paper Session

Music City Center

Traditionally geospatial analysis has relied on statistical models that explicitly model spatial
correlations in the data. Recently, machine learning algorithms, such as neural networks and random
forests, are increasingly used in geospatial analysis. However, most machine learning algorithms do
not possess the functionality to directly encode spatial correlations. There is limited understanding of
the consequences of ignoring spatial correlations in machine learning algorithms applied to geospatial
data, despite this practice becoming increasingly common. We show empirically and theoretically that
ignoring spatial correlations reduces accuracy of machine learning algorithms for geospatial data.
We then propose well-principled machine learning algorithms for geospatial data that explicitly model
the spatial correlation as in traditional geostatistics. The basic principle is guided by how ordinary least
squares (OLS) extends to generalized least squares (GLS) for linear models to explicitly account for data
covariance. We demonstrate how the same extensions can be done for random forests and neural
networks, presenting the RF-GLS and NN-GLS algorithms. We provide extensive theoretical and
empirical support for the methods and show how they fare better than naïve or brute-force
approaches to use machine learning algorithms for spatially correlated data. We present the software
packages RandomForestsGLS and geospaNN implementing these methods.

Keywords

Neural networks

Geospatial data

Machine learning

Random forests

Gaussian processes

Spatial statistics