Advances in Statistical Modeling

Hasthika Rupasinghe Chair
Appalachian State University
 
Wednesday, Aug 6: 10:30 AM - 12:20 PM
4152 
Contributed Papers 
Music City Center 
Room: CC-104A 

Main Sponsor

IMS

Presentations

Bayesian Generalized Linear Model for Difference of Over or Under Dispersed Counts

Modelling the difference of two counts has many practical uses in statistics. The Skellam distribution can be used for such a model, however since the Skellam distribution is constructed as the difference of two Poisson distributions it is potentially unsuitable for modelling data that suffers from under or over dispersion. We take a first look at constructing a Bayesian generalized linear model for the difference of counts that can handle both under and over dispersion based on the difference of two Conway-Maxwell Poisson distribution (that is, a Conway-Maxwell Skellam distribution). The focus of this paper is on providing an explicit demonstration using the Metropolis-Hastings algorithm. 

Keywords

Count Data


Overdispersion

Underdispersion

Conway-Maxwell Skellam


Bayesian

Metropolis-Hastings 

Co-Author

Kimberly Sellers, North Carolina State University

First Author

Andrew Swift, University of Nebraska At Omaha

Presenting Author

Andrew Swift, University of Nebraska At Omaha

Comparison of generalized negative binomial models

The negative binomial distribution is a well-studied distribution used to model overdispersed count data. Other distributions meanwhile provide ways for added flexibility relative to the negative binomial model. This work introduces various generalizations of the negative binomial model inspired by the Conway-Maxwell-Poisson distribution, and compares their statistical properties and related qualities. 

Keywords

count data

overdispersion

negative binomial 

First Author

Kimberly Sellers, North Carolina State University

Presenting Author

Kimberly Sellers, North Carolina State University

Graphical model estimation from incomplete data and auxiliary variables

Probabilistic graphical models are essential for analyzing large multivariate data. However, data missingness can hinder graphical model estimation when a complete sample covariance matrix is unavailable. We propose new methods to estimate a complete graphical model from incomplete data with the help of auxiliary variables that may be informative about the variables' dependence structure. For instance, the dependence between two neurons typically weakens as the distance between them increases. We investigate two approaches: (a) estimating a sparse graphical model based on a sample covariance matrix completed via the previously proposed AuxCov method, which assumes a relationship between correlations and auxiliary variables; and (b) directly estimating a complete precision matrix by applying a penalty that reflects the relationship between partial correlations and auxiliary variables. We assess our methods theoretically, through simulations, and by analyzing large-scale neuroscience data. 

Keywords

conditional dependence

cross validation

graphical lasso

matrix completion

multivariate analysis

sparsity 

Co-Author

Giuseppe Vinci, University of Notre Dame

First Author

Joseph Steneman, University of Notre Dame

Presenting Author

Joseph Steneman, University of Notre Dame

Model Selection from Incomplete Data in Supervised and Unsupervised Learning

Scientific datasets are often undermined by missing data, which can occur either randomly or structurally. Applying traditional supervised and unsupervised learning techniques to these incomplete datasets poses significant challenges. Model selection, in particular, becomes highly complex due to the impact on resampling methods and theoretical guarantees when dealing with partially observed random vectors. By leveraging resampling techniques, information theory, and stability measures, we propose novel approaches to model selection in supervised and unsupervised learning, with a particular focus on factor analysis and graphical modeling. We provide theoretical foundations and simulation results to demonstrate the effectiveness of these methods, along with applications to neuroscience and genomics. 

Keywords

bayesian information criterion

cross-validation

missing data

sparsity

tuning parameter

variable selection 

First Author

Giuseppe Vinci, University of Notre Dame

Presenting Author

Giuseppe Vinci, University of Notre Dame

On Transformation Discriminant Analysis

Multigroup discriminant analysis is a key classification technique with diverse applications. Linear and quadratic discriminant analysis (LDA and QDA) are popular methods that assume normal class distributions. This limits their application to non-normal data. We propose a transformation-based extension of LDA and QDA to handle asymmetry and skewness. Simulations and real-world applications show our method outperforms existing approaches in various scenarios. 

Keywords

Transformation

discriminant analysis

classification 

Co-Author

Jing Li

First Author

Yana Melnykov, University of Alabama

Presenting Author

Yana Melnykov, University of Alabama

Preferential Latent Space Models for Networks with Textual Edges

Many real-world networks contain rich textual information in the edges, such as email networks where an edge between two nodes is an email exchange. The useful textual information carried in the edges is often discarded in most network analyses, resulting in an incomplete view of the relationships between nodes. In this work, we represent each text document as a generalized multi-layer network, and introduce a new and flexible preferential latent space network model that can capture how node-layer preferences directly modulate edge probabilities. We establish identifiability conditions for the proposed model and tackle model estimation with a computationally efficient projected gradient descent algorithm. We further derive the non-asymptotic error bound of the estimator from each step of the algorithm. The efficacy of our proposed method is demonstrated through simulations and an analysis of the Enron email network. 

Keywords

latent space model

multi-layer network

non-convex optimization 

Co-Author(s)

Biao Cai
Dong Li, Tsinghua University
Xiaoyue Niu, Penn State University
Emma Jingfei Zhang, Emory University

First Author

Maoyu Zhang

Presenting Author

Biao Cai

Spillover mechanism selection via exposure mapping

Consider an experimental study on a population of units connected by a network. A common tool deployed in the literature to define and estimate spillover effects is of exposure mappings. These mappings reduce the complexity of the estimand to a lower dimension, facilitating identifiability. It is assumed that this mapping is correctly specified leaving the choice of the exposure mapping to the analyst. This makes the estimators of the spillover effect, like the Horvitz-Thompson estimator, vulnerable to biases. While it has been shown that these estimators are robust to some controlled forms of misspecification, there is a dearth of methodological advancement in empirically investigating appropriate exposure mappings. We propose to facilitate a model selection procedure that helps in choosing the most suited exposure mapping given a choice of candidate exposure mappings in an experimental setting. 

Keywords

Causal Inference

Network Interference

Model Selection

Peer Effects

Randomized Trials 

Co-Author

Pallavi Basu, Indian School of Business

First Author

Supriya Tiwari

Presenting Author

Supriya Tiwari