Monday, Aug 4: 2:00 PM - 3:50 PM
4077
Contributed Papers
Music City Center
Room: CC-202A
Main Sponsor
Section on Nonparametric Statistics
Co Sponsors
Section on Nonparametric Statistics
Presentations
We present a non-parametric change-point detection approach for detecting potentially sparse changes in a time series of high-dimensional observations or non-Euclidean data objects. We target a change in distribution that occurs in a smaller (unknown) subset of dimensions, where the dimensions may be correlated. Our work is motivated by a remote sensing application where changes occur in small, spatially clustered regions over time. An adaptive block-based change-point detection framework is proposed that accounts for spatial dependencies across dimensions and leverages these dependencies to boost detection power and estimation accuracy. Through simulation studies, we demonstrate that our approach has superior performance in detecting sparse changes for datasets with spatial or local group structures. An application of of the proposed method to detect activity, such as new construction, in remote sensing imagery of the Natanz Nuclear facility in Iran is presented to demonstrate the method's efficacy.
Keywords
Change-point
Non-parametric
Spatial Dependence
Graph-based Tests
High-dimensional data
Satellite Imagery
Existing prediction methods for clustered data often depend on strong model assumptions, making them vulnerable to model misspecification. We propose a hierarchical conformal prediction framework for predicting outcomes of new subjects at specific time points or trajectories in clustered data with missing responses, without requiring the specification of the prediction model or within-subject correlations. The idea is to establish marginal prediction for clustered data while utilizing subsampling techniques to accommodate dependency and appropriate weighting to address distribution shifts caused by missing data.
To address complex error distributions, including skewed and multimodal cases, we construct the prediction region using the highest conditional density set of the target distribution. Additionally, we propose an enhanced approach, termed localized prediction, to more effectively adapt to heterogeneous or atypical subjects. This method achieves not only marginal coverage but also local and asymptotic conditional coverage for a given subject within a subset or specific profile, while converging to optimal interval lengths under consistent estimation conditions.
Keywords
Conformal prediction
Conditional coverage
Distribution shift
Marginal prediction
Missing at random
Repeated subsampling
Knockoff variable selection is a powerful method to create synthetic variables to mirror the correlation structure of observed features, enabling principled false discovery rate control. Existing methods often assume homogeneous data (all numeric or all categorical) or rely on known distributions, limitations that arise with heterogeneous data and unknown distributions. Moreover, standard measures of variable importance often rely on well-specified outcome models (e.g., linear), making them unsuitable for nonlinear relationships.
We introduce a generalizable knockoff generation procedure based on conditional residuals, handling heterogeneous data with unknown distributions. We further propose an interpretable importance measure, the Mean Absolute Local Derivatives (MALD), to quantify variable influence for arbitrary outcome functions, and can be implemented with random forests or neural networks. Simulation studies show that our method outperforms existing ones, controlling the false discovery rate with superior power. We apply these methods to DNA methylation data of mouse tissue samples to select CpG sites related to age. We provide software implementations in R and Python.
Keywords
Variable Selection
Nonparametric
Machine Learning
Wide Data
Knockoffs
Co-Author
Zhe Fei, University of California, Riverside
First Author
Evan Mason, UC Riverside
Presenting Author
Evan Mason, UC Riverside
The correlation coefficient is fundamental in advanced statistical analysis. However, traditional methods of calculating correlation coefficients can be biased due to the existence of confounding variables. Such confounding variables could act in an additive or multiplicative fashion. To study the additive model, previous research has shown residual-based estimation of correlation coefficients. The empirical likelihood (EL) has been used to construct the confidence interval for the correlation coefficient. With small sample size situations, the coverage probability of EL, for instance, can be below 90% at confidence level 95%. We propose new methods of interval estimation for the correlation coefficient using jackknife empirical likelihood, mean jackknife empirical likelihood and adjusted jackknife empirical likelihood. For better performance with small sample sizes, we also propose mean adjusted empirical likelihood. The simulation results show the best performance with mean adjusted jackknife empirical likelihood when the sample sizes are as small as 25. Real data analyses are used to illustrate the proposed approach.
Keywords
Correlation coefficient
Distortion errors
Adjusted jackknife empirical likelihood
Mean jackknife empirical likelihood
Mean adjusted jackknife empirical likelihood
Jackknife empirical likelihood
Dealing with time-varying linear processes, their stationary companion processes come in handy for proving various results. However, espacially considering limit distributions, their lack of observability hamper statistical procedures like hypothesis testing. In this case, the so-called local block bootstrap established by Dowla et al. (2013) provides a sound way out. Said bootstrap procedure is based on the choice of different bootstrap parameters which each have a distinct impact on the simulation results. We illustrate the influence of different parameter choices with an extended simulation study using alpha-stable distributions in combination with empirical characteristic functions. The former is a wide class of distributions ensuring the transferability of our results, whereas the latter opens the way to various procedures including independence testing. Additionally, we present a bootstrap central limit theorem allowing for the formulation of bootstrap confidence intervals by the pivotal method without relying on the normal distribution.
Keywords
Local stationarity
Local block bootstrap
Central limit theorem
Nonparametric statistics
We consider estimation of the receiver operating characteristic curve and the ordinal dominance curve. The nonparametric estimation is based on delta-sequences. We also consider estimation of the partial area under the receiver operating characteristic curve and the ordinal dominance curve. This is obtained by local estimation of the delta-sequences. We characterize feasible statistics induced by central limit theory for the estimation procedure. A numerical simulation corroborates the asymptotic theory.
Keywords
nonparametric estimation
ROC curve
partial area
ODC curve
delta-sequence
local estimation
This study introduces a novel semiparametric regression model based on the starshaped mean equilibrium life (SMEL) function to describe the mean remaining life of aging systems. The SMEL function, exhibiting a decreasing-then-increasing pattern, provides a flexible framework for modeling non-monotonic aging behaviors. Addressing the challenge of non-identifiability of the survival function, we propose a nonparametric testing procedure to validate the starshaped assumption. An adaptive semiparametric MCMC algorithm is developed to estimate regression parameters and select optimal priors, ensuring robust Bayesian regularization. Validated through simulations and real-world applications, the methodology effectively captures complex aging patterns, offering actionable insights for reliability analysis, survival modeling, and decision-making in healthcare, engineering, and actuarial science. This work bridges semiparametric regression, Bayesian inference, and nonparametric testing, advancing the theoretical and computational foundations of aging modeling.
Keywords
Semiparametric Regression
Mean Equilibrium Life Function
Bayesian Inference
Nonparametric Testing
Aging Modeling
Starshaped Function