Statistical Learning with Graphs and Networks

Keith Levin Chair
University of Wisconsin
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
4233 
Contributed Papers 
Music City Center 
Room: CC-106B 

Main Sponsor

Section on Statistical Learning and Data Science

Presentations

A Theoretical Framework to Signed Graph Learning

In many real-world applications such as biological and social networks, the interactions between the different entities in a complex system are better captured by signed graphs. Learning graph topology from observed signals has been addressed in both statistical theory and graph signal processing. However, most of the approaches focus on unsigned graphs. In this paper, we introduce a signed graph learning method based on the smoothness assumption. In particular, we assume that graph signals admit low-frequency representation over positive edges, while admitting high-frequency representation over negative edges. These two goals are simultaneously achieved and formulated in an optimization problem. A bound on the estimation error of the graph Laplacian is derived as a function of the number of observations and nodes. Finally, the proposed method is compared with both signed and unsigned graph learning approaches with respect to the increasing number of nodes, noise levels, and balance of the graphs.The scalability of the proposed approach compared to the state-of-the-art methods is also illustrated. 

Keywords

Graph Signal Processing

Signed Graph Learning

Estimation Error

Graph Laplacian 

Co-Author(s)

Abdullah Karaaslanli, Department of Electrical and Computer Engineering, Michigan State University
Tapabrata Maiti, Michigan State University
Selin Aviyente, Department of Electrical and Computer Engineering, Michigan State University

First Author

Bisakh Banerjee, Michigan State University

Presenting Author

Bisakh Banerjee, Michigan State University

An Ensemble Approach for Graph-Based Classification

Class Cover Catch Digraphs (CCCDs) constitute graph theoretic solutions to the class cover problem and have been employed in classification. Two main variants are Pure-CCCDs (P-CCCDs) that construct a pure, proper cover and Random Walk-CCCDs (RW-CCCDs) that construct an (potentially) impure, improper cover. Previous results showed CCCD classifiers were robust to class imbalance, but not class overlap. Classification performance in settings of class imbalance and class overlap of CCCDs was worse than other tested strong classifiers. In this work, we developed a novel CCCD framework, called FlexiBalls. We provide theoretical justification for using FlexiBall as opposed to other CCCD variants through both VC dimension and computational cost as the basis for boosting. We detail the boosted FlexiBall classifier algorithm which we call the AdaCover Digraph Classifier (ACDC). We close by presenting results of ACDC versus other strong models in both synthetic Monte Carlo simulations and real data settings under AUC, F1-Score, and G-Mean metrics. ACDC is competitive to other strong models selected in all settings, and typically performs better for each metric when classes are heavily overlapped and imbalanced.  

Keywords

classification

graph-based classifiers

ensemble learning

class imbalance

boosting 

Co-Author

Elvan Ceyhan, Auburn University

First Author

Jordan Eckert, Auburn University

Presenting Author

Jordan Eckert, Auburn University

Identifiability of Boolean graphical models

Boolean graphical models-including prominent subfamilies such as cognitive diagnosis models and Boolean matrix decompositions-find broad applications from social sciences to engineering. Despite their flexibility, a key challenge lies in establishing the identifiability of their graphical structures, which determine how latent variables affect observed data. Existing work often relies on the strong assumption of pure nodes-observed variables that depend directly on only one latent variable. While mathematically convenient, these assumptions may be unrealistic in many real-world settings. To address this, we develop a novel graphical approach using Hasse diagrams, which transforms the identifiability problem into a graph isomorphism challenge. Building on this perspective, we propose sufficient and necessary conditions for identifiability that do not require pure nodes; rather, the graphical structure is identifiable precisely when the corresponding graphical representation is unique. 

Keywords

Identifiability

Boolean graphical models

Graphical structures

Boolean matrix decomposition

Cognitive diagnosis models

Hasse diagrams 

Co-Author

Gongjun Xu, University of Michigan

First Author

Mengqi Lin

Presenting Author

Mengqi Lin

Novel CCD-Based Algorithms for High-Accuracy Outlier Detection

We propose a novel family of outlier detection algorithms built upon Cluster Catch Digraphs (CCDs)
and their extensions, designed to overcome the limitations of existing methods in handling
high-dimensional, heterogeneous, and irregularly shaped data.
Our approach introduces Mutual Catch Graphs (MCGs) to enhance the discrimination between inliers
and outliers by incorporating local density and geometric structure.
Building on CCDs derived from Kolmogorov-Smirnov-type statistics, Ripley's K function,
and nearest neighbor distances, we develop a suite of algorithms---U-MCCD, UN-MCCD, and their
shape-adaptive variants (SU-MCCD and SUN-MCCD)---which adaptively refine cluster boundaries
and suppress false detections.
These methods are largely parameter-free or require minimal tuning,
making them scalable and user-friendly.
We provide theoretical guarantees for computational complexity,
demonstrate robustness through extensive Monte Carlo experiments,
and evaluate performance across a wide range of dimensions, cluster configurations,
and contamination levels.
Our results show that shape-adaptive variants significantly improve detection accuracy,
particularly in high-dimensional or non-uniform settings,
where traditional graph- and density-based methods often fail. 

Keywords

Outlier detection

Outlyingness score

Graph-based clustering

Cluster catch digraphs

High-dimensional data 

Co-Author(s)

Nedret Billor, Auburn University
Elvan Ceyhan, Auburn University

First Author

Rui Shi, Auburn University

Presenting Author

Rui Shi, Auburn University

Optimal Graph Joining and Graph Isomorphism

In this talk, we will introduce a new notion of optimal joinings for undirected graphs, which is based on constrained optimal transport between the simple random walks on the graphs. After giving the definition of optimal joinings for undirected graphs, we will state some of the basic mathematical properties and then present results indicating that these optimal joinings can be used to detect and identify graph isomorphisms. We also present some numerical experiments with random graphs such as Erdos-Renyi models, stochastic block models. This is a joint work with Yang Xiang, Bongsoo Yi, Kevin McGoff, and Andrew B. Nobel. 

Keywords

joinings

optimal transport

graph isomorphism

random walks 

Co-Author(s)

Yang Xiang, UNC Chapel Hill
Bongsoo Yi, UNC Chapel Hill
Kevin McGoff
Andrew Nobel, University of North Carolina

First Author

Phuong Hoang, University of North Carolina at Charlotte

Presenting Author

Phuong Hoang, University of North Carolina at Charlotte

Scalable Signed Exponential Random Graph Models under Local Dependence

Traditional network analysis focuses on binary edges, while real-world relationships are more nuanced, encompassing cooperation, neutrality, and conflict. The rise of negative ties in social media discussions spurred interest in analyzing signed interactions, especially in polarized debates. However, the vast data generated by digital networks presents challenges for traditional methods like Stochastic Block Models (SBM) and Exponential Family Random Graph Models (ERGM), particularly due to the homogeneity assumption and global dependence, which become increasingly unrealistic as network size grows. To address this, we propose a novel method that combines the strengths of SBM and ERGM while mitigating their weaknesses by incorporating local dependence based on non-overlapping neighborhoods. Our approach involves a two-step process: first, decomposing the network into sub-networks using SBM approximation, and then estimating parameters using ERGM methods. We demonstrate the computational efficiency of our approach by applying it to a large signed Wikipedia network and validating our method on synthetic networks with up to 5,000 nodes. 

Keywords

Network Analysis

Exponential Random Graph Models

Signed Networks

Local Dependence

Large-Scale Networks 

Co-Author

Cornelius Fritz, Trinity College Dublin

First Author

Marc Schalberger

Presenting Author

Marc Schalberger

Statistical inference for noisy dynamic networks

In this work we develop techniques to estimate and perform inference on subgraph densities using time-indexed network sequences. These estimates explicitly adjust for observation errors for the network edges, and have good theoretical properties as the size of the network grows. By specifying a stochastically evolving hidden Markov network model, we are able to address two areas left for further investigation by Chang et al. (2022): robustness to non-identical network replicates, and efficient aggregation of all available data. Adaptation to these new settings vastly expands the applicability of their methods to real data settings, where network replicates are commonly observed dynamically. The methodology is also extended to consider joint inference for subgraph densities at multiple time points, to facilitate formal statistical comparison of dynamic network snapshots. 

Keywords

Dynamic network

Network inference

Hidden Markov model

Observation error 

Co-Author

Eric Kolaczyk, McGill University

First Author

Peter MacDonald, University of Waterloo

Presenting Author

Peter MacDonald, University of Waterloo