Wednesday, Aug 6: 10:30 AM - 12:20 PM
4152
Contributed Papers
Music City Center
Room: CC-104A
Main Sponsor
IMS
Presentations
Modelling the difference of two counts has many practical uses in statistics. The Skellam distribution can be used for such a model, however since the Skellam distribution is constructed as the difference of two Poisson distributions it is potentially unsuitable for modelling data that suffers from under or over dispersion. We take a first look at constructing a Bayesian generalized linear model for the difference of counts that can handle both under and over dispersion based on the difference of two Conway-Maxwell Poisson distribution (that is, a Conway-Maxwell Skellam distribution). The focus of this paper is on providing an explicit demonstration using the Metropolis-Hastings algorithm.
Keywords
Count Data
Overdispersion
Underdispersion
Conway-Maxwell Skellam
Bayesian
Metropolis-Hastings
The negative binomial distribution is a well-studied distribution used to model overdispersed count data. Other distributions meanwhile provide ways for added flexibility relative to the negative binomial model. This work introduces various generalizations of the negative binomial model inspired by the Conway-Maxwell-Poisson distribution, and compares their statistical properties and related qualities.
Keywords
count data
overdispersion
negative binomial
Probabilistic graphical models are essential for analyzing large multivariate data. However, data missingness can hinder graphical model estimation when a complete sample covariance matrix is unavailable. We propose new methods to estimate a complete graphical model from incomplete data with the help of auxiliary variables that may be informative about the variables' dependence structure. For instance, the dependence between two neurons typically weakens as the distance between them increases. We investigate two approaches: (a) estimating a sparse graphical model based on a sample covariance matrix completed via the previously proposed AuxCov method, which assumes a relationship between correlations and auxiliary variables; and (b) directly estimating a complete precision matrix by applying a penalty that reflects the relationship between partial correlations and auxiliary variables. We assess our methods theoretically, through simulations, and by analyzing large-scale neuroscience data.
Keywords
conditional dependence
cross validation
graphical lasso
matrix completion
multivariate analysis
sparsity
Scientific datasets are often undermined by missing data, which can occur either randomly or structurally. Applying traditional supervised and unsupervised learning techniques to these incomplete datasets poses significant challenges. Model selection, in particular, becomes highly complex due to the impact on resampling methods and theoretical guarantees when dealing with partially observed random vectors. By leveraging resampling techniques, information theory, and stability measures, we propose novel approaches to model selection in supervised and unsupervised learning, with a particular focus on factor analysis and graphical modeling. We provide theoretical foundations and simulation results to demonstrate the effectiveness of these methods, along with applications to neuroscience and genomics.
Keywords
bayesian information criterion
cross-validation
missing data
sparsity
tuning parameter
variable selection
Multigroup discriminant analysis is a key classification technique with diverse applications. Linear and quadratic discriminant analysis (LDA and QDA) are popular methods that assume normal class distributions. This limits their application to non-normal data. We propose a transformation-based extension of LDA and QDA to handle asymmetry and skewness. Simulations and real-world applications show our method outperforms existing approaches in various scenarios.
Keywords
Transformation
discriminant analysis
classification
Many real-world networks contain rich textual information in the edges, such as email networks where an edge between two nodes is an email exchange. The useful textual information carried in the edges is often discarded in most network analyses, resulting in an incomplete view of the relationships between nodes. In this work, we represent each text document as a generalized multi-layer network, and introduce a new and flexible preferential latent space network model that can capture how node-layer preferences directly modulate edge probabilities. We establish identifiability conditions for the proposed model and tackle model estimation with a computationally efficient projected gradient descent algorithm. We further derive the non-asymptotic error bound of the estimator from each step of the algorithm. The efficacy of our proposed method is demonstrated through simulations and an analysis of the Enron email network.
Keywords
latent space model
multi-layer network
non-convex optimization
Consider an experimental study on a population of units connected by a network. A common tool deployed in the literature to define and estimate spillover effects is of exposure mappings. These mappings reduce the complexity of the estimand to a lower dimension, facilitating identifiability. It is assumed that this mapping is correctly specified leaving the choice of the exposure mapping to the analyst. This makes the estimators of the spillover effect, like the Horvitz-Thompson estimator, vulnerable to biases. While it has been shown that these estimators are robust to some controlled forms of misspecification, there is a dearth of methodological advancement in empirically investigating appropriate exposure mappings. We propose to facilitate a model selection procedure that helps in choosing the most suited exposure mapping given a choice of candidate exposure mappings in an experimental setting.
Keywords
Causal Inference
Network Interference
Model Selection
Peer Effects
Randomized Trials