Wednesday, Aug 6: 8:30 AM - 10:20 AM
4233
Contributed Papers
Music City Center
Room: CC-106B
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
In many real-world applications such as biological and social networks, the interactions between the different entities in a complex system are better captured by signed graphs. Learning graph topology from observed signals has been addressed in both statistical theory and graph signal processing. However, most of the approaches focus on unsigned graphs. In this paper, we introduce a signed graph learning method based on the smoothness assumption. In particular, we assume that graph signals admit low-frequency representation over positive edges, while admitting high-frequency representation over negative edges. These two goals are simultaneously achieved and formulated in an optimization problem. A bound on the estimation error of the graph Laplacian is derived as a function of the number of observations and nodes. Finally, the proposed method is compared with both signed and unsigned graph learning approaches with respect to the increasing number of nodes, noise levels, and balance of the graphs.The scalability of the proposed approach compared to the state-of-the-art methods is also illustrated.
Keywords
Graph Signal Processing
Signed Graph Learning
Estimation Error
Graph Laplacian
Class Cover Catch Digraphs (CCCDs) constitute graph theoretic solutions to the class cover problem and have been employed in classification. Two main variants are Pure-CCCDs (P-CCCDs) that construct a pure, proper cover and Random Walk-CCCDs (RW-CCCDs) that construct an (potentially) impure, improper cover. Previous results showed CCCD classifiers were robust to class imbalance, but not class overlap. Classification performance in settings of class imbalance and class overlap of CCCDs was worse than other tested strong classifiers. In this work, we developed a novel CCCD framework, called FlexiBalls. We provide theoretical justification for using FlexiBall as opposed to other CCCD variants through both VC dimension and computational cost as the basis for boosting. We detail the boosted FlexiBall classifier algorithm which we call the AdaCover Digraph Classifier (ACDC). We close by presenting results of ACDC versus other strong models in both synthetic Monte Carlo simulations and real data settings under AUC, F1-Score, and G-Mean metrics. ACDC is competitive to other strong models selected in all settings, and typically performs better for each metric when classes are heavily overlapped and imbalanced.
Keywords
classification
graph-based classifiers
ensemble learning
class imbalance
boosting
Boolean graphical models-including prominent subfamilies such as cognitive diagnosis models and Boolean matrix decompositions-find broad applications from social sciences to engineering. Despite their flexibility, a key challenge lies in establishing the identifiability of their graphical structures, which determine how latent variables affect observed data. Existing work often relies on the strong assumption of pure nodes-observed variables that depend directly on only one latent variable. While mathematically convenient, these assumptions may be unrealistic in many real-world settings. To address this, we develop a novel graphical approach using Hasse diagrams, which transforms the identifiability problem into a graph isomorphism challenge. Building on this perspective, we propose sufficient and necessary conditions for identifiability that do not require pure nodes; rather, the graphical structure is identifiable precisely when the corresponding graphical representation is unique.
Keywords
Identifiability
Boolean graphical models
Graphical structures
Boolean matrix decomposition
Cognitive diagnosis models
Hasse diagrams
We propose a novel family of outlier detection algorithms built upon Cluster Catch Digraphs (CCDs)
and their extensions, designed to overcome the limitations of existing methods in handling
high-dimensional, heterogeneous, and irregularly shaped data.
Our approach introduces Mutual Catch Graphs (MCGs) to enhance the discrimination between inliers
and outliers by incorporating local density and geometric structure.
Building on CCDs derived from Kolmogorov-Smirnov-type statistics, Ripley's K function,
and nearest neighbor distances, we develop a suite of algorithms---U-MCCD, UN-MCCD, and their
shape-adaptive variants (SU-MCCD and SUN-MCCD)---which adaptively refine cluster boundaries
and suppress false detections.
These methods are largely parameter-free or require minimal tuning,
making them scalable and user-friendly.
We provide theoretical guarantees for computational complexity,
demonstrate robustness through extensive Monte Carlo experiments,
and evaluate performance across a wide range of dimensions, cluster configurations,
and contamination levels.
Our results show that shape-adaptive variants significantly improve detection accuracy,
particularly in high-dimensional or non-uniform settings,
where traditional graph- and density-based methods often fail.
Keywords
Outlier detection
Outlyingness score
Graph-based clustering
Cluster catch digraphs
High-dimensional data
In this talk, we will introduce a new notion of optimal joinings for undirected graphs, which is based on constrained optimal transport between the simple random walks on the graphs. After giving the definition of optimal joinings for undirected graphs, we will state some of the basic mathematical properties and then present results indicating that these optimal joinings can be used to detect and identify graph isomorphisms. We also present some numerical experiments with random graphs such as Erdos-Renyi models, stochastic block models. This is a joint work with Yang Xiang, Bongsoo Yi, Kevin McGoff, and Andrew B. Nobel.
Keywords
joinings
optimal transport
graph isomorphism
random walks
Traditional network analysis focuses on binary edges, while real-world relationships are more nuanced, encompassing cooperation, neutrality, and conflict. The rise of negative ties in social media discussions spurred interest in analyzing signed interactions, especially in polarized debates. However, the vast data generated by digital networks presents challenges for traditional methods like Stochastic Block Models (SBM) and Exponential Family Random Graph Models (ERGM), particularly due to the homogeneity assumption and global dependence, which become increasingly unrealistic as network size grows. To address this, we propose a novel method that combines the strengths of SBM and ERGM while mitigating their weaknesses by incorporating local dependence based on non-overlapping neighborhoods. Our approach involves a two-step process: first, decomposing the network into sub-networks using SBM approximation, and then estimating parameters using ERGM methods. We demonstrate the computational efficiency of our approach by applying it to a large signed Wikipedia network and validating our method on synthetic networks with up to 5,000 nodes.
Keywords
Network Analysis
Exponential Random Graph Models
Signed Networks
Local Dependence
Large-Scale Networks
In this work we develop techniques to estimate and perform inference on subgraph densities using time-indexed network sequences. These estimates explicitly adjust for observation errors for the network edges, and have good theoretical properties as the size of the network grows. By specifying a stochastically evolving hidden Markov network model, we are able to address two areas left for further investigation by Chang et al. (2022): robustness to non-identical network replicates, and efficient aggregation of all available data. Adaptation to these new settings vastly expands the applicability of their methods to real data settings, where network replicates are commonly observed dynamically. The methodology is also extended to consider joint inference for subgraph densities at multiple time points, to facilitate formal statistical comparison of dynamic network snapshots.
Keywords
Dynamic network
Network inference
Hidden Markov model
Observation error