New Methods for Complex Outcomes

Julia Wrobel Chair
 
Monday, Aug 5: 2:00 PM - 3:50 PM
5064 
Contributed Papers 
Oregon Convention Center 
Room: CC-F149 

Main Sponsor

IMS

Presentations

Ancestral Inference for Branching Process in Random Environments

Ancestral inference for Branching processes in random environments (BPRE) is concerned with the inference regarding the parameters of the ancestor distribution generating the process. In this presentation, we describe a new generalized method of moments methodology for inference using replicated BPRE data. Even though the evolution of the process strongly depends on the offspring means of various generations, we establish that the joint limiting distribution of the ancestor and the offspring estimators mean, under appropriate centering and scaling, decouple and converge to independent normal random variables when the ratio of the number of generations to the logarithm of the number of replicates converge to zero. We also provide estimators for the limiting variance and illustrate our results using numerical experiments and data from Polymerase Chain Reaction (PCR) experiments. 

Keywords

BPRE

Ancestral Inference

Joint CLT

PCR 

View Abstract 2504

Co-Author

Anand Vidyashankar, George Mason University

First Author

Xiaoran Jiang

Presenting Author

Xiaoran Jiang

Asymptotic Properties of the Square Root Transformation of the Gamma Distribution

Power transformations of the gamma distribution to approximate normality have been a topic of research for the past 100 years. Fisher (1925) proposed the square-root transformation of the chi-square distribution, while Wilson & Hilferty (1931) and Hernandez & Johnson (1980) proved the asymptotic optimality of the cube-root transformation. We employ the Kullback-Leibler information number criterion of Hernandez & Johnson (1980) to prove that the square-root transformation of the gamma distribution is asymptotically optimal when the normal distribution with a fixed variance is set as the target distribution. In particular, a stronger mode of convergence than the convergence in distribution is achieved in the normal case, implying that the square-root transformation is an asymptotically optimal variance-stabilizing power transformation. Additionally, by utilizing the asymptotic expansion of the normalized upper incomplete gamma function at the transition point, we show that the Kullback-Leibler information number is also minimized with the square-root transformation when the target distribution is set to be the Laplace distribution with a fixed scale parameter. 

Keywords

Box-Cox transformation

Incomplete gamma function

Kullback-Leibler divergence

Laplace distribution

Normality

Variance-stabilizing transformation 

View Abstract 3774

Co-Author

Kimihiro Noguchi

First Author

Mayla Ward

Presenting Author

Mayla Ward

Bayesian Generalized Linear Model for Difference of Over or Under Dispersed Counts

Modelling the difference of two counts has many practical uses in statistics. The Skellam distribution can be used for such a model, however since the Skellam distribution is constructed as the difference of two Poisson distributions it is potentially unsuitable for modelling data that suffers from under or over dispersion. We take a first look at constructing a Bayesian generalized linear model for the difference of counts that can handle both under and over dispersion based on the difference of two Conway-Maxwell Poisson distribution (that is, a Conway-Maxwell Skellam distribution). The focus of this paper is on providing an explicit demonstration using the Metropolis-Hastings algorithm. 

Keywords

Count Data

Overdispersion

Underdispersion

Conway-Maxwell Skellam

Bayesian

Metropolis-Hastings 

View Abstract 3609

Co-Author

Kimberly Sellers, North Carolina State University

First Author

Andrew Swift, University of Nebraska At Omaha

Presenting Author

Andrew Swift, University of Nebraska At Omaha

Financial forensic statistics: novel methods and a case study.

This talk will consider a novel approach to detecting anomalous transactions linked with fraud in food stamp purchases.
The methods detect clusters in the order statistics of the transaction amounts that merit further scrutiny. The techniques then use scan statistics to determine when an excessive number of transactions occur (cluster) about some price point, which is shown to be historically linked to fraud. A scoring paradigm is constructed that ranks the degree in which detected clusters and individual transactions are anomalous among approximately 250 million total transactions. 

Keywords

Fraud

Data Science

Scan Statistic

Order Statistic

Markov chain

Forensic Science 

View Abstract 3700

Co-Author(s)

Robert Lund, University of California, Santa Cruz
Tung-Lung Wu, Mississippi State University
Zhicong Zhao, Mississippi State University

First Author

Jonathan Woody, Mississippi State University

Presenting Author

Jonathan Woody, Mississippi State University

Maximum Likelihood and Moment Matching for High-Noise Group Orbit Estimation

Motivated by applications to single-particle cryo-electron microscopy (cryo-EM), we study a problem of group orbit estimation where samples of an unknown signal are observed under uniform random rotations from a rotational group. In high-noise regime, we describe a stratification of the Fisher information eigenvalues according to transcendence degrees in the algebra of group invariants. We relate the critical points of the log-likelihood optimization landscape to those of a sequence of moment matching problems. Some examples including a simplified model of cryo-EM will be discussed. 

Keywords

Maximum Likelihood Estimation

Group Orbit Estimation

Fisher Information

Optimization Landscape

Moment Matching

Cryo-Electron Microscopy 

Abstracts


Co-Author(s)

Zhou Fan
Tianhao Wang, Yale University
Roy Lederman, Yale University
Yi Sun, The University of Chicago

First Author

Sheng Xu, Princeton University

Presenting Author

Sheng Xu, Princeton University

Robust estimation and inference in categorical data

While many methods exist for robust estimation of models for continuous variables, the literature on robust estimation in categorical data is scarce, although many relevant variables are categorical, such as questionnaire responses, self-reported health, or counting processes. I propose a general framework for robustly estimating statistical functionals or parameters in models for possibly multivariate categorical data. The proposed estimator generalizes maximum likelihood estimation, is strongly consistent, asymptotically Gaussian, and is of the same time complexity as maximum likelihood. In addition, I develop a novel test that tests whether a given observation can be fitted well by the assumed model, thereby conceptualizing the notion of an outlier in categorical data. I verify the attractive statistical properties of the proposed methodology in simulation studies, and demonstrate its practical usefulness in an empirical application on structural equation modeling of questionnaire responses, where I find compelling evidence for the presence of inattentive respondents whose adverse effects the proposed estimator can withstand, unlike maximum likelihood. 

Keywords

Robust statistics

Discrete variables

Multivariate statistics

Asymptotic normality

Outlier detection 

Abstracts


First Author

Max Welz, Erasmus University Rotterdam

Presenting Author

Max Welz, Erasmus University Rotterdam

Univariate and Bivariate Compound Geometric Gaussian Distributions

Compound geometric random variables have been widely applied in the financial and actuarial disciplines. In these applications, the random variables subject to random sums are almost always non-negative. However, we propose a family of compound geometric distributions that deviates from this common paradigm. Consider a compound geometric Gaussian distribution, that is, a geometric sum of zero-mean Gaussian random variables. With an added location parameter and a particular normalization factor based on the geometric parameter, this forms a continuum of distributions between the Laplace and Gaussian families with convenient properties and only three parameters. These properties include easy interpretation of the parameters. In this work, we explore the characteristics, density functions, parameter interpretations and estimation techniques, and possible applications for this family. In addition, we investigate two 5-parameter bivariate versions of the family that combine the flexibilities of both the corresponding bivariate Laplace and Gaussian families. We exhibit some of the properties, examples, and parameter interpretation and estimation techniques for both families. 

Keywords

Compound geometric

Gaussian

multivariate

random sums of random variables 

View Abstract 2578

Co-Author

Barry Arnold

First Author

Matthew Arvanitis, USDA Forest Products Laboratory

Presenting Author

Matthew Arvanitis, USDA Forest Products Laboratory