Print Close

Computational Advances in Hypothesis Testing

Semhar Michael Chair
South Dakota State University

Tuesday, Aug 5: 8:30 AM - 10:20 AM
4088
Contributed Papers

Music City Center

Room: CC-104E

Main Sponsor

Section on Statistical Computing

Presentations

Distfreereg: A New R Package for Distribution-Free Parametric Regression Testing

Verifying that inference using a parametric regression model is reliable is a crucial step in statistical model building. It helps avoid invalid modeling conclusions based on false assumptions. For example, the p-value associated with a coefficient in a linear model is unreliable if the mean function being used is incorrect. Until now, there has been no easy and reliable way in R to test whether or not the mean function is correct.

In my presentation, I shall introduce my new R package, "distfreereg", that I have written to implement the distribution-free testing procedure for parametric regression models introduced by Estate Khmaladze in 2021. I shall outline Khmaladze's algorithm, discuss the main features of the package, and illustrate its use with an example.

Keywords

goodness-of-fit testing

regression

distribution-free testing

R package

First Author

Jesse Miller

Presenting Author

Jesse Miller

Efficient Nonparametric Two-Sample Hypothesis Testing Methods for Large Networks via Subsampling

We examine two-sample hypothesis testing in random networks within the Random Dot Product Graph (RDPG) framework, and develop a time-efficient algorithm. We distinguish between semiparametric and nonparametric testing, emphasizing the latter for its flexibility and independence from network size. The nonparametric approach assumes that vertex interactions are governed by exchangeable latent distances, and the central question is whether the latent distance distributions differ between two networks. To address this, a U-statistic-based test statistic approximating maximum mean discrepancy is used, which is computationally complex for large networks. Given the challenge, we introduce a subsampling-based method that partitions large networks, analyzes smaller subgraphs, and aggregates the results. Our contributions include designing a subsampling-based latent position estimator and validating a bootstrap-based testing procedure, as well as developing several faster divide-and-conquer testing methods. This work advances efficient and consistent network analysis, with broad applicability across diverse domains.

Keywords

Two-sample hypothesis testing

Network model

Nonparametric testing

Subsampling

Time efficient algorithm

Random Dot Product Graph

Co-Author(s)

Srijan Sengupta, North Carolina State University
Yuguo Chen, University of Illinois at Urbana-Champaign

First Author

Kaustav Chakraborty

Presenting Author

Kaustav Chakraborty

False Discovery Control with a Tuning Parameter

Multiple testing procedures that control the false discovery rate (FDR) have been widely adopted for testing large number of hypotheses. The Benjamini and Hochberg multiple testing procedure (the BH procedure) was the first procedure introduced to control the FDR. However, as the total number of false null hypotheses increases, the BH procedure becomes overly conservative and thus lacks power. In this paper, we present a Two-Stage BH procedure with a tuning parameter. In stage I, this procedure estimates the total number of true null hypotheses m₀, which is then used to adjust the level of significance when applying the BH procedure in stage II. The proposed procedure incorporates a tuning parameter providing tighter control of the FDR and enhancing statistical power. Theoretical properties of the proposed procedure and its power performance will be presented.

Keywords

False discovery rate

Multiple testing procedures

Benjamini-Hochberg

Tuning parameter

Co-Author

Wei-Min Huang, Lehigh University

First Author

Nasrine Bendjilali, Rowan University

Presenting Author

Nasrine Bendjilali, Rowan University

Improved confidence intervals based on combined information in univariate calibration

The problem of testing/estimating a common explanatory variable based on combined information from independent calibration models is addressed. The response variables are measured using different instruments, methods, or at different laboratories. It is assumed that the calibration model at each source is a simple linear regression model and the model parameters at the different sources are different. In this scenario, the problem of constructing a confidence interval (CI) for a common unknown value of the explanatory variable is addressed. Confidence intervals for the unknown explanatory variable that can be found by inverting some popular combined tests are proposed. These CIs are exact and better than a CI in the literature. All CIs are compared with respect to precision and some recommendations are made. Interval estimation methods are illustrated using two examples.

Keywords

Combined tests; Controlled calibration; Fisher's test; Maximum likelihood estimates

Co-Author

Kalimuthu Krishnamoorthy

First Author

Md Monzur Murshed, university of Louisiana at Lafayette

Presenting Author

Md Monzur Murshed, university of Louisiana at Lafayette

Inference on Two-Parameter Negative Binomial Distributions: One- and Two-Sample Problems

A modified likelihood ratio test and confidence intervals for the mean of a two-parameter negative binomial (NB) distribution are proposed and compared with available ones. The problems of testing/estimating the ratio or the difference of the means of two NB distributions are also considered.
Assuming that the dispersion parameters are equal an improved version of the likelihood ratio test for the ratio of means of two NB distributions is proposed. Methods of variance estimate recovery (MOVER) are used to find confidence intervals for the ratio or the difference of two means when the dispersion parameters are unknown and arbitrary. The tests and interval estimation methods are illustrated using an example with count data on seizures from two groups of patients.

Keywords

over-dispersion

powers

score test

standardized LRT

type I error

Co-Author

Md Mahadi Hasan, Murray State University

First Author

Kalimuthu Krishnamoorthy, University of Louisiana at Lafayette

Presenting Author

Md Mahadi Hasan, Murray State University

Inside Out: Externalizing Assumptions in Data Analysis as Validation Checks

In data analysis, unexpected results often prompt researchers to revisit their proce- dures to identify potential issues. While some researchers may struggle to identify the root causes, experienced researchers can often quickly diagnose problems by checking a few key assumptions. These checked assumptions, or expectations, are typically informal, diﬀicult to trace, and rarely discussed in publications. In this paper, we introduce the term analysis validation checks to formalize and externalize these informal assumptions. We then introduce a procedure to identify a subset of checks that best predict the occurrence of unexpected outcomes, based on simula- tions of the original data. The checks are evaluated in terms of accuracy, determined by binary classification metrics, and independence, which measures the shared in- formation among checks. We demonstrate this approach with a toy example using step count data and a generalized linear model example examining the effect of particulate matter air pollution on daily mortality.

Keywords

data analysis

data validation

diagnostics

Co-Author

Roger Peng, University of Texas, Austin

First Author

Sherry Zhang, The University of Texas at Austin

Presenting Author

Sherry Zhang, The University of Texas at Austin

MCMC-CE: Efficient Bayes Factor Estimation for Bayesian Hypothesis Testing with Non-conjugate Priors via the Cross-Entropy Method

The accurate and efficient estimation of Bayes factors is critical for Bayesian model comparison, particularly when evaluating competing hypotheses in complex statistical models. Traditional computational approaches often suffer from inefficiency, instability, and poor scalability, especially when dealing with non-conjugate priors. In this work, we propose MCMC-CE, an advanced method that extends the cross-entropy (CE) technique—originally developed for rare-event probability estimation—to improve the computation of marginal likelihoods in Bayesian hypothesis testing and linear regression models. Our approach integrates the CE method within a Markov chain Monte Carlo (MCMC) framework to optimize proposal distributions and efficiently approximate the marginal likelihood. We apply MCMC-CE to both hypothesis testing via Bayes factors and Bayesian model averaging. Extensive simulation studies and real-world data applications demonstrate that MCMC-CE significantly outperforms existing methods in terms of computational speed, numerical stability, and estimation accuracy. These results suggest that MCMC-CE provides a powerful and scalable solution for Bayesian inference in challenging modeling scenarios.

Keywords

Marginal likelihood

Cross-entropy method

Markov chain Monte Carlo

Bayes factor

Bayesian model averaging

Bayesian linear regression

Co-Author(s)

Devin Lundy, Augusta Univeristy
Vy Ong, Wayne State University
Yin Wan, Wanye State University

First Author

Yang Shi, Wayne State University

Presenting Author

Yang Shi, Wayne State University