New Methods for Networks, Graphical Models and Covariances

Emily Hector Chair
North Carolina State University
 
Wednesday, Aug 7: 10:30 AM - 12:20 PM
5149 
Contributed Papers 
Oregon Convention Center 
Room: CC-C120 

Main Sponsor

IMS

Presentations

Implementation of a sequential TreeScan algorithm for disease surveillance

TreeScan is a popular algorithm for hierarchical testing of hypotheses. The algorithm is used in scenarios where the hypotheses under consideration naturally form a hierarchical tree structure, such as in the areas of pharmaceutical drugs or occupations, thereby allowing one to detect unsuspected relationships. Its tree-based scan statistic only assumes a minimum of prior assumptions about the input, and it adjusts for the multiple testing that is inherent in the tree-based testing scenarios. However, the tree structure of the hypotheses is assumed fixed in TreeScan, thus impeding its use in application areas which require dynamic updates, such as time-varying patient enrollment during trials. For this reason, we extend TreeScan to incorporate a sequential testing design which is capable of controlling either the FWER or the FDR criterion by means of appropriate alpha spending. We apply our improved algorithm to EHR and claims databases to study the relationship between health events and various potential risk factors. 

Keywords

TreeScan

sequential



disease surveillance

hierarchical testing

hypotheses 

View Abstract 3270

First Author

Georg Hahn

Presenting Author

Georg Hahn

Lower Ricci Curvature for Efficient Community Detection

This study introduces the Lower Ricci Curvature (LRC), a novel, scalable, and scale-free discrete curvature designed to enhance community detection in networks. Addressing the computational challenges posed by existing curvature-based methods, LRC offers a streamlined approach with linear computational complexity, making it well-suited for large-scale network analysis. We further develop an LRC-based preprocessing method that effectively augments popular community detection algorithms. Through comprehensive simulations and applications on real-world datasets, including the NCAA football league network, the DBLP collaboration network, the Amazon product co-purchasing network, and the YouTube social network, we demonstrate the efficacy of our method in significantly improving the performance of various community detection algorithms. 

Keywords

Network curvature

Network pruning

Large-scale network 

View Abstract 3802

Co-Author

Didong Li

First Author

Yun Jin Park, University of North Carolina at Chapel Hill

Presenting Author

Didong Li

A truncated pairwise likelihood approach for high-dimensional covariance estimation

Pairwise likelihood allows inference for distributions with high-dimensional dependencies by combining marginal pairwise likelihood functions. In certain models, including the multivariate normal distribution, pairwise and full likelihoods are maximized by the same parameter values, thus retaining the same statistical efficiency when the number of variables is fixed. We propose to estimate sparse high-dimensional covariance matrices by maximizing a truncated pairwise likelihood function including only terms corresponding to nonzero covariance elements. Pairwise likelihood truncation is obtained by minimizing the distance between pairwise and full likelihood scores plus a L1-penalty discouraging the inclusion of relatively noisy terms. Differently from other regularization approaches, our penalty focuses on whole pairwise likelihood objects rather than on individual parameters, thus retaining unbiased estimating equations. Our asymptotic analysis shows that the resulting estimator has the same efficiency as the oracle maximum likelihood estimator based on the knowledge of the nonzero covariance entries. The properties of the new method are confirmed by numerical examples. 

Keywords

Composite likelihood

High-dimensional covariance

L1-penalty

Pairwise likelihood

Sparse covariance 

View Abstract 1942

Co-Author(s)

Zhendong Huang, University of Melbourne
Alessandro Casa, University of Bolzano

First Author

Davide Ferrari

Presenting Author

Davide Ferrari

Covariance Matrix Completion via Auxiliary Information

Covariance matrix estimation is an important task in the analysis of multivariate data in disparate scientific fields, including neuroscience, genomics, and astronomy. However, modern scientific data are often incomplete due to factors beyond the control of researchers, and data missingness may prohibit the use of traditional covariance estimation methods. Some existing methods address this problem by completing the data matrix, or by filling the missing entries of an incomplete sample covariance matrix by assuming a low-rank structure. We propose a novel approach that exploits auxiliary variables to complete covariance matrix estimates. An example of auxiliary variable is the distance between neurons, which is usually inversely related to the strength of neuronal covariation. Our method extracts auxiliary information via regression, and involves a single tuning parameter that can be selected empirically. We compare our method with other matrix completion approaches theoretically, via simulations, and in graphical model estimation from large-scale neuroscience data. 

Keywords

graphical models

missing data

regression

prediction

regularization

neuroscience 

View Abstract 2923

Co-Author

Giuseppe Vinci, University of Notre Dame

First Author

Joseph Steneman

Presenting Author

Joseph Steneman

Recent advances on some tensor shrinkage estimators and recent identities

In this talk, we present a class of tensor shrinkage estimators as well as some challenges related to the risk analysis of such estimators. We also present some recent identities which are useful in establishing the risk dominance of tensor shrinkage estimators. 

Keywords

Tensor parameter

Tensor estimators

Shrinkage estimators

Risk function

Tensor Stein-rules 

View Abstract 2950

Co-Author

Mai Ghannam

First Author

Severien Nkurunziza, University of Windsor

Presenting Author

Severien Nkurunziza, University of Windsor

Advancing Forensic Evidence Evaluation through Statistics: Likelihood Ratio and Graphical Models

The persistent issue of innocent individuals being wrongly convicted emphasizes the need for scrutiny and improvement in the US criminal justice system. Statistical methods for forensic evidence evaluation, including glass, fingerprints, and DNA, have helped solve complex crime investigations. Yet, national-level standards that could enforce the rigorous implementation of statistical analyses of forensic evidence have not been established. We investigate the use and misuse of statistical methods in crime investigations, such as the likelihood ratio approach for hypothesis testing. We further consider graphical models, where hypotheses and evidence can be represented as nodes connected by arrows describing association or causality. We emphasize the advantages of special graph structures, such as object-oriented Bayesian networks and chain event graphs, which allow for the concurrent examination of evidence of various nature. Finally, we discuss strategies to make the interpretation of statistical analyses of forensic evidence more accessible to non-statisticians, especially in the courtroom where decisions about the fate of potentially innocent individuals are made every day. 

Keywords

Forensic statistics

DNA typing

Hypothesis testing

Likelihood ratio

Graphical models

Bayesian networks 

View Abstract 3047

Co-Author

Giuseppe Vinci, University of Notre Dame

First Author

Xiangyu Xu, University of Notre Dame

Presenting Author

Xiangyu Xu, University of Notre Dame

Geometric Graph Matching with Message Passing Neural Networks

We study matching random graphs with geometric structure using graph neural networks. To this end, we consider a special family of random geometric graphs where two vertices are connected if the overlap in their binary features surpasses a fixed threshold. For two such graphs, we have access to a random subset of edges together with noisy observations of their underlying vertex features. Our goal is to recover an unknown vertex alignment from the noisy and incomplete information.
We show that solving a linear assignment problem with only noisy vertex features fails in certain parameter regimes. In contrast, if the features are passed through a specially designed message passing neural network, we can achieve perfect recovery with high probability. We also show that the bound for perfect recovery is tight up to logarithmic factors.
Finally, we apply the algorithm to aligning medical concepts from different coding systems (e.g., codified and NLP) with their genetic associations and demonstrate that better alignment accuracy can be achieved with the help of the medical knowledge graph. 

Keywords

random geometric graph

graph neural network

entity alignment

linear assignment problem

graph matching

random intersection graph 

View Abstract 3292

Co-Author(s)

Tianxi Cai, Harvard University
Morgane Austern, Harvard University

First Author

Suqi Liu, Harvard University

Presenting Author

Suqi Liu, Harvard University