Gaussian Process Spatial Clustering
Conference: Symposium on Data Science and Statistics (SDSS) 2023
05/24/2023: 4:10 PM - 4:15 PM CDT
Lightning
Spatial clustering is a common unsupervised learning problem with many applications to areas such as public health, urban planning, or transportation, where the goal is to identify clusters of similar locations based on regionalization as well as patterns in characteristics over those locations. Unlike standard clustering, a well-studied area with a rich literature including methods such as K-Means clustering, spectral clustering, and hierarchical clustering, spatial clustering is a relatively sparse area of study due to inherent differences between the spatial domain of the data and its corresponding covariates. For example, in the American Community Survey dataset, spatial differences in tracts cannot be directly compared to differences in participant survey responses to indicators such as employment status or income. In this paper, we develop a spatial clustering algorithm, called Gaussian Process Spatial Clustering (GPSC), which clusters functions between data leveraging the flexibility of Gaussian processes and extend it to the case of clustering geospatial data. We provide theoretical guarantees and demonstrate its capabilities to recover true clusters in several simulation studies and a real-world dataset to identify clusters of tracts in North Carolina based on socioeconomic and environmental indicators associated with health and cancer risk.
North Carolina Breast Cancer Study
Functional Similarity
Presenting Author
Hongqian Niu
First Author
Hongqian Niu
CoAuthor(s)
Melissa Troester, University of North Carolina - Chapel Hill
Didong Li, University of North Carolina, Chapel Hill
Target Audience
Mid-Level
Tracks
Machine Learning
Symposium on Data Science and Statistics (SDSS) 2023
You have unsaved changes.