Gaussian Process Spatial Clustering

Conference: Symposium on Data Science and Statistics (SDSS) 2023
05/24/2023: 4:10 PM - 4:15 PM CDT
Lightning 

Description

Spatial clustering is a common unsupervised learning problem with many applications to areas such as public health, urban planning, or transportation, where the goal is to identify clusters of similar locations based on regionalization as well as patterns in characteristics over those locations. Unlike standard clustering, a well-studied area with a rich literature including methods such as K-Means clustering, spectral clustering, and hierarchical clustering, spatial clustering is a relatively sparse area of study due to inherent differences between the spatial domain of the data and its corresponding covariates. For example, in the American Community Survey dataset, spatial differences in tracts cannot be directly compared to differences in participant survey responses to indicators such as employment status or income. In this paper, we develop a spatial clustering algorithm, called Gaussian Process Spatial Clustering (GPSC), which clusters functions between data leveraging the flexibility of Gaussian processes and extend it to the case of clustering geospatial data. We provide theoretical guarantees and demonstrate its capabilities to recover true clusters in several simulation studies and a real-world dataset to identify clusters of tracts in North Carolina based on socioeconomic and environmental indicators associated with health and cancer risk.

Keywords

North Carolina Breast Cancer Study

Functional Similarity 

Presenting Author

Hongqian Niu

First Author

Hongqian Niu

CoAuthor(s)

Melissa Troester, University of North Carolina - Chapel Hill
Didong Li, University of North Carolina, Chapel Hill

Target Audience

Mid-Level

Tracks

Machine Learning
Symposium on Data Science and Statistics (SDSS) 2023