Identifying Classification Thresholds in Shuffled Stochastic Block Models

Vera Andersson Speaker
University of Maryland, College Park
 
Vince Lyzinski Co-Author
University of Maryland
 
Thursday, Aug 6: 10:30 AM - 12:20 PM
3535 
Contributed Papers 
Traditional classification methods like k-nearest neighbors (kNN) are widely used in practical applications and have demonstrated effectiveness under the assumption of well-observed networks with known labels. However, in practice, networks are frequently not fully observed due to anonymization, data collection inaccuracies, or missing information, resulting in estimated or entirely unknown node labels. This lack of information could compromise statistical inference if methods heavily rely on label-specific attributes. We investigate the impact of node shuffling on classification performance within a Stochastic Block Model framework. Specifically, we use kNN combined with Procrustes alignment of latent positions to classify graphs from two groups differing by a perturbation. Our empirical and theoretical results reveal that in the homogeneous case, the classification rate declines with increasing shuffled vertices. However, for a large enough perturbation, a change point occurs at which the classification rate resurges. Notably, a reflection is observed in the Procrustes alignment at this point, which becomes more pronounced with increasing perturbation.

Keywords

Stochastic Block Model (SBM)

Node shuffling

k-Nearest Neighbors (kNN)

Graph Classification

Procrustes Alignment

Adjacency Spectral Embedding (ASE) 

Main Sponsor

Section on Statistical Learning and Data Science