Wednesday, Aug 6: 10:30 AM - 12:20 PM
4180
Contributed Papers
Music City Center
Room: CC-103A
Main Sponsor
Section on Nonparametric Statistics
Presentations
Many estimands can be defined through constrained optimization problems with a stochastic component, for instance principal components analysis, constrained maximum likelihood estimation, and many penalized estimation problems. To obtain asymptotic theory when an estimand is on the boundary of the constraint set, researchers have leveraged significant insight from the perturbation analysis of optimization problems, which studies how optimization problems vary under small changes in auxiliary parameters. Despite the previously developed asymptotic theory, the literature about the efficiency of such estimators has focused on finite-dimensional settings and convex objective functions. We help fill this gap by showing how to derive efficient influence functions for general estimands defined through constrained optimization problems with potentially infinite-dimensional nuisance parameters. We lean again on perturbation theory and offer general results for practitioners who may be interested in deriving influence functions for their own estimands of interest, as well as describe when pathwise differentiability may fail to hold. We provide examples of how this theory can be applied to calculate influence functions for several specific estimands in both semiparametric and nonparametric settings to allow for efficient root-n estimation.
Keywords
Efficient influence function
M-estimation
Constrained optimization
Asymptotic theory
Perturbation analysis of optimization problems
In this talk, I will discuss recent advances in the strong convergence of the kernel estimation of the cumulative distribution function (CDF). The first part of the talk focuses on the law of the iterated logarithm (LIL) of L1-norms of kernel estimators for CDF based on the independent and identically distributed (i.i.d) data. The second part extends the LIL of i.i.d case to that of Lp-norms of the residual-based kernel estimators of error CDF in the Autoregressive Models. Some simulation results showing the estimators' performances.
Keywords
Kernel Estimator
Autoregressive Models
Lp-norm
LIL
In this study, we propose a novel regression function estimator for scenarios involving errors-in-variables within a convolution model, particularly when the data are subject to right-censoring. By leveraging the tail behavior of the characteristic function of the error distribution, we establish the optimal local and global convergence rates for the kernel estimators. Our results reveal thatthe convergence rate depends on the smoothness of the error distribution: It is slower for super smooth errors and faster for ordinary smooth errors, both locally and globally. Importantly, we demonstrate that while the choice of kernel K has a negligible impact on the optimality of the mean square error (MSE), the bandwidth h plays a critical role. Through simulations across varying sample sizes and 100 replications per setting, we validate the theoretical findings. Finally, we apply the proposed estimator to analyze the relationship between advanced lung cancer cases and Karnofsky Performance Scores, offering practical insight into this medical context.
Keywords
Kernel Regression
Deconvolution
Right Censored Data
Additive Measurement Errors
Co-Author(s)
Shan Sun, Univ of Texas At Arlington, Dept. of Mathematics
Dengdeng Yu, University of Texas at San Antonio
Qiang Zheng
First Author
Erol Ozkan, University of Texas at Arlington
Presenting Author
Will Chen, University of Texas at Arlington
Many problems involve data exhibiting both temporal and cross-sectional dependencies. While linear dependencies have been extensively studied, the theoretical analysis of estimators under nonlinear dependencies remains scarce. This work studies a kernel-based estimation procedure for nonlinear dynamics within the reproducing kernel Hilbert space framework, focusing on nonlinear stochastic regression and nonlinear vector autoregressive models. We derive nonasymptotic probabilistic bounds on the deviation between a kernel estimator and the true nonlinear regression function. A key technical contribution is a concentration bound for quadratic forms of stochastic matrices in the presence of dependent data, which may be of independent interest. Additionally, we characterize conditions on multivariate kernels required to achieve optimal convergence rates.
Keywords
nonlinear dynamics
vector autoregressive model
reproducing kernel Hilbert space
concentration inequality
time series
machine learning
We study the problem of detecting changes in conditional distributions over time, where the relationship between inputs and responses shifts at unknown time points, referred to as change points. The conditional distributions are assumed to belong to a structured class of hierarchical models and remain piecewise constant between change points. Our objective is to estimate the locations of these changes and analyze the conditions under which they can be reliably detected. To achieve such a task, a novel method, Deep Distributional Change Point Detection, is introduced. It combines a Dense ReLU network-based estimation algorithm with a Seeded Binary Segmentation procedure to efficiently identify and localize changes in conditional distributions. Our theoretical analysis examines the impact of varying model parameters as the number of observations increases, including the minimum spacing between consecutive change points and the smallest detectable shift in distributions. We establish fundamental limits on localization accuracy and derive the minimum signal strength required for consistent detection. Extensive numerical experiments demonstrate the effectiveness of the proposed method.
Keywords
Change Point
Dense ReLU Network
CUSUM estimator
Seeded Binary Segmentation
This paper considers the problem of testing for latent structure in large symmetric data matrices. The goal here is to develop statistically principled methodology that is flexible in its applicability and insensitive to data variation, thereby overcoming limitations facing existing approaches. To do so, we introduce and systematically study symmetric matrices, called Wilcoxon--Wigner random matrices, whose entries are normalized rank statistics derived from an underlying independent and identically distributed sample of absolutely continuous random variables. These matrices naturally arise as the matricization of one-sample problems in statistics and conceptually lie at the interface of nonparametrics, multivariate analysis, and data reduction. Among our results, we establish that the leading eigenvalue and corresponding eigenvector of Wilcoxon--Wigner random matrices admit asymptotically Gaussian fluctuations with explicit centering and scaling terms. These asymptotic results, which are parameter-free and distribution-free, enable rigorous spectral methodology for addressing two hypothesis testing problems, namely community detection and principal submatrix localization.
Keywords
Rank statistic
Hypothesis testing
Semicircle distribution
Bai--Yin law
Outlier eigenvalue and eigenvector
Spectral method
In a simple linear regression model, one has a numeric response variable Y and a predictor X. The model has an intercept β_0 and slope β_1, which are unknown. We assume X is numeric. We will have data (Y1, X1), (Y2, X2), …, (Yn, Xn). The observation Yi is drawn from the conditional distribution of Y|X=Xi. Using the data, one can estimate the intercept and slope of the model using the least squared method. The estimators are linear in Yi's, and are unbiased with minimum variance. Assume all Xi's are distinct. A unique line passes through any two points (Yi, Xi) and (Yj, Xj) with i≠j. We will have a swarm of lines, each of which provides unbiased estimators of β_0 and β_1. The swarm is used to develop a nonparametric regression model of Y on X. We show that a small subset of the swarm combined reproduces the least squared estimators. We extended this result to the case of two predictors.
Keywords
simple linear regression
linear regression with two predictors
nonparametric simple linear regression
swarm of regressions