Wednesday, Aug 7: 10:30 AM - 12:20 PM
5149
Contributed Papers
Oregon Convention Center
Room: CC-C120
Main Sponsor
IMS
Presentations
TreeScan is a popular algorithm for hierarchical testing of hypotheses. The algorithm is used in scenarios where the hypotheses under consideration naturally form a hierarchical tree structure, such as in the areas of pharmaceutical drugs or occupations, thereby allowing one to detect unsuspected relationships. Its tree-based scan statistic only assumes a minimum of prior assumptions about the input, and it adjusts for the multiple testing that is inherent in the tree-based testing scenarios. However, the tree structure of the hypotheses is assumed fixed in TreeScan, thus impeding its use in application areas which require dynamic updates, such as time-varying patient enrollment during trials. For this reason, we extend TreeScan to incorporate a sequential testing design which is capable of controlling either the FWER or the FDR criterion by means of appropriate alpha spending. We apply our improved algorithm to EHR and claims databases to study the relationship between health events and various potential risk factors.
Keywords
TreeScan
sequential
disease surveillance
hierarchical testing
hypotheses
This study introduces the Lower Ricci Curvature (LRC), a novel, scalable, and scale-free discrete curvature designed to enhance community detection in networks. Addressing the computational challenges posed by existing curvature-based methods, LRC offers a streamlined approach with linear computational complexity, making it well-suited for large-scale network analysis. We further develop an LRC-based preprocessing method that effectively augments popular community detection algorithms. Through comprehensive simulations and applications on real-world datasets, including the NCAA football league network, the DBLP collaboration network, the Amazon product co-purchasing network, and the YouTube social network, we demonstrate the efficacy of our method in significantly improving the performance of various community detection algorithms.
Keywords
Network curvature
Network pruning
Large-scale network
Pairwise likelihood allows inference for distributions with high-dimensional dependencies by combining marginal pairwise likelihood functions. In certain models, including the multivariate normal distribution, pairwise and full likelihoods are maximized by the same parameter values, thus retaining the same statistical efficiency when the number of variables is fixed. We propose to estimate sparse high-dimensional covariance matrices by maximizing a truncated pairwise likelihood function including only terms corresponding to nonzero covariance elements. Pairwise likelihood truncation is obtained by minimizing the distance between pairwise and full likelihood scores plus a L1-penalty discouraging the inclusion of relatively noisy terms. Differently from other regularization approaches, our penalty focuses on whole pairwise likelihood objects rather than on individual parameters, thus retaining unbiased estimating equations. Our asymptotic analysis shows that the resulting estimator has the same efficiency as the oracle maximum likelihood estimator based on the knowledge of the nonzero covariance entries. The properties of the new method are confirmed by numerical examples.
Keywords
Composite likelihood
High-dimensional covariance
L1-penalty
Pairwise likelihood
Sparse covariance
Covariance matrix estimation is an important task in the analysis of multivariate data in disparate scientific fields, including neuroscience, genomics, and astronomy. However, modern scientific data are often incomplete due to factors beyond the control of researchers, and data missingness may prohibit the use of traditional covariance estimation methods. Some existing methods address this problem by completing the data matrix, or by filling the missing entries of an incomplete sample covariance matrix by assuming a low-rank structure. We propose a novel approach that exploits auxiliary variables to complete covariance matrix estimates. An example of auxiliary variable is the distance between neurons, which is usually inversely related to the strength of neuronal covariation. Our method extracts auxiliary information via regression, and involves a single tuning parameter that can be selected empirically. We compare our method with other matrix completion approaches theoretically, via simulations, and in graphical model estimation from large-scale neuroscience data.
Keywords
graphical models
missing data
regression
prediction
regularization
neuroscience
In this talk, we present a class of tensor shrinkage estimators as well as some challenges related to the risk analysis of such estimators. We also present some recent identities which are useful in establishing the risk dominance of tensor shrinkage estimators.
Keywords
Tensor parameter
Tensor estimators
Shrinkage estimators
Risk function
Tensor Stein-rules
The persistent issue of innocent individuals being wrongly convicted emphasizes the need for scrutiny and improvement in the US criminal justice system. Statistical methods for forensic evidence evaluation, including glass, fingerprints, and DNA, have helped solve complex crime investigations. Yet, national-level standards that could enforce the rigorous implementation of statistical analyses of forensic evidence have not been established. We investigate the use and misuse of statistical methods in crime investigations, such as the likelihood ratio approach for hypothesis testing. We further consider graphical models, where hypotheses and evidence can be represented as nodes connected by arrows describing association or causality. We emphasize the advantages of special graph structures, such as object-oriented Bayesian networks and chain event graphs, which allow for the concurrent examination of evidence of various nature. Finally, we discuss strategies to make the interpretation of statistical analyses of forensic evidence more accessible to non-statisticians, especially in the courtroom where decisions about the fate of potentially innocent individuals are made every day.
Keywords
Forensic statistics
DNA typing
Hypothesis testing
Likelihood ratio
Graphical models
Bayesian networks
We study matching random graphs with geometric structure using graph neural networks. To this end, we consider a special family of random geometric graphs where two vertices are connected if the overlap in their binary features surpasses a fixed threshold. For two such graphs, we have access to a random subset of edges together with noisy observations of their underlying vertex features. Our goal is to recover an unknown vertex alignment from the noisy and incomplete information.
We show that solving a linear assignment problem with only noisy vertex features fails in certain parameter regimes. In contrast, if the features are passed through a specially designed message passing neural network, we can achieve perfect recovery with high probability. We also show that the bound for perfect recovery is tight up to logarithmic factors.
Finally, we apply the algorithm to aligning medical concepts from different coding systems (e.g., codified and NLP) with their genetic associations and demonstrate that better alignment accuracy can be achieved with the help of the medical knowledge graph.
Keywords
random geometric graph
graph neural network
entity alignment
linear assignment problem
graph matching
random intersection graph