Recent Advances in Statistical Integration for Network and Complex Data

Jesus Arroyo Chair
Texas A&M University
 
Jesus Arroyo Organizer
Texas A&M University
 
Thursday, Aug 7: 10:30 AM - 12:20 PM
0856 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-201B 

Applied

No

Main Sponsor

Section on Nonparametric Statistics

Co Sponsors

Section on Statistical Computing
Section on Statistical Learning and Data Science

Presentations

Bayes in Multi-Layer Networks

We present an approach for analyzing multilayer networks to address complex inference challenges in fields like security and neuroscience. We introduce a supervised learning framework that leverages inter- and intra-layer dependencies to predict continuous outcomes. Using low-rank models, this method captures intricate relationships, identifies key nodes and edges, and improves computation speed. Its effectiveness is demonstrated on network security data from a national laboratory, significantly enhancing prediction accuracy. 

Speaker

Sharmistha Guha, Texas A&M University

Distribution-invariant Node Differential Privacy for Network Data

Differential privacy is a well-established framework for safeguarding sensitive information in data. While extensively applied across various domains, its application to network data—particularly at the node level—remains underexplored. Existing methods for node-level privacy either focus exclusively on query-based approaches, which restrict output to pre-specified network statistics, or fail to preserve key structural properties of the network. In this work, we present what is, to the best of our knowledge, the first mechanism capable of releasing an entire network structure while satisfying node-level differential privacy. Within the broad class of latent space models, we demonstrate that the released network asymptotically follows the same distribution as the original network and preserves global network moments. Additionally, our method supports individualized privacy budgets for each node, maintaining linkage between the released network and the original network under the privacy constraints. The effectiveness of the approach is evaluated through extensive experiments on both synthetic and real-world datasets.
 

Speaker

Tianxi Li, University of Minnesota

Efficient Analysis of Latent Spaces in Heterogeneous Networks

This work proposes a unified framework for efficient estimation under latent space modeling of heterogeneous networks. We consider a class of latent space models that decompose latent vectors into shared and network-specific components across networks. We develop a novel procedure that first identifies the shared latent vectors and further refines estimates through efficient score equations to achieve statistical efficiency. Oracle error rates for estimating the shared and heterogeneous latent vectors are established simultaneously. The analysis framework offers remarkable flexibility, accommodating various types of edge weights under exponential family distributions. 

Keywords

Heterogeneous

Network

Latent space model

Efficient score 

Speaker

Yinqiu He, University of Wisconsin-Madison

Optimizing the Induced Correlation in Omnibus Joint Graph Embeddings

Theoretical and empirical evidence suggests that joint graph embedding algorithms induce correlation across the networks in the embedding space. In the Omnibus joint graph embedding framework, previous results explicitly delineated the dual effects of the algorithm-induced and model-inherent correlations on the correlation across the embedded networks. Accounting for and mitigating the algorithm-induced correlation is key to subsequent inference, as sub-optimal Omnibus matrix constructions have been demonstrated to lead to loss in inference fidelity. This work presents the first efforts to automate the Omnibus construction in order to address two key questions in this joint embedding framework: the correlation-to-OMNI problem and the flat correlation problem. In the flat correlation problem, we seek to understand the minimum algorithm-induced flat correlation (i.e., the same across all graph pairs) produced by a generalized Omnibus embedding. Working in a subspace of the fully general Omnibus matrices, we prove both a lower bound for this flat correlation and that the classical Omnibus construction induces the maximal flat correlation. In the correlation-to-OMNI problem, we present an algorithm -- named corr2Omni -- that, from a given matrix of estimated pairwise graph correlations, estimates the matrix of generalized Omnibus weights that induces optimal correlation in the embedding space. Moreover, in both simulated and real data settings, we demonstrate the increased effectiveness of our corr2Omni algorithm versus the classical Omnibus construction. 

Speaker

Vince Lyzinski, University of Maryland

West Nile virus forecasting with densely connected graph neural networks

West Nile virus is a significant, and growing, public health issue in the United States. With no human vaccine, mosquito control programs rely on accurate forecasting to determine when and where WNV will emerge. Recently, spatial Graph neural networks (GNNs) were shown to be a powerful tool for WNV forecasting, significantly improving over traditional methods. Building on this work, we introduce a new GNN variant that linearly connects graph attention layers, allowing us to train much larger models than previously used for WNV forecasting. This architecture specializes general densely connected GNNs so that the model focuses more heavily on local information to prevent over smoothing. To support training large GNNs we compiled a massive new dataset of weather data, land use information, and mosquito trap results across Illinois. Experiments show that our approach significantly outperforms both GNN and classical baselines in both out-of-sample and out-of-graph WNV prediction skill across a variety of scenarios and over all prediction horizons. 

Speaker

Trevor Harris