Monday, Aug 4: 8:30 AM - 10:20 AM
4039
Contributed Papers
Music City Center
Room: CC-Davidson Ballroom A2
Main Sponsor
Section on Nonparametric Statistics
Presentations
A fast and powerful method for comparing and visualizing high-dimensional
datasets is presented. A novel statistic that is inspired by the interpoint
distances, but avoids their computation is proposed. The Euclidean distance
is not a suitable distance for high dimensional settings due to the distance
concentration phenomenon. We offer statistics based on two high-dimensional
dissimilarity indices that take advantage of the concentration phenomenon.
A simultaneous display of observations means and standard deviations that aids visualization,
detection of suspect outliers, and enhances separability among the competing classes in
the transformed space is discussed. We study the finite sample convergence of the dissimilarity indices,
compare eight statistics under several distributions, and present three applications.
Keywords
Interpoint distance;
Concentration;
Dissimilarity Indices.
We propose to assess the quality of approximation of normalized sums by its Gaussian analogues over a new class of convex sets in a high-dimensional setup. This class allows to quantify the effect of sparsity on the convergence rate explicitly, which generalizes Chernozhukov et.al. (2017) results for hyper-rectangles and s-sparse convex sets. We also show that several of recent methods for tests of high-dimensional means are of the form of supremum of normalized sums over these new classes of convex sets. As application, we propose a new distribution and correlation free K-sample test of high-dimensional means (K>2) MANOVA, which non-trivially generalizes recent 2-sample test by Xue and Yao (2020). Finally, we also propose new tests of linear hypotheses in MANOVA. The tests are studied rigorously both theoretically and on simulations studies. They show very good performance in comparison to the existing methods in the literature.
Keywords
high-dimensional testing
multiplier bootstrap
MANOVA
A single-index regression model is considered, from which a robust and efficient inference about the model parameters is proposed. From a local linear approximation of the unknown regression function, such a function is estimated using the generalized signed-rank approach. Next considering the estimated function together with the estimating equation obtained from the generalized sign-rank objective function, a penalized empirical likelihood objective function of the index parameter is defined, from which its asymptotic distribution is established under mild regularity conditions. The performance of the proposed method is demonstrated via extensive Monte Carlo simulation experiments. The obtained simulation results are compared with those obtained from a normal approximation alternative and those obtained based on the least squares and least absolute deviations approaches. Finally, a real data example is given to illustrate the proposed methodology.
Keywords
Signed-rank norm
Chi-square distribution
Oracle property
Variable selection.
Multivariate hypothesis testing is fundamental in modern data analysis, particularly in high-dimensional settings. This work revisits the concept of statistically equivalent blocks and demonstrates how it can be used to generalize several classical nonparametric tests beyond the univariate setting. The proposed generalization preserves important testing properties without relying on spatial ranks or data depth. A key application is the reformulation of the precedence test using statistically equivalent blocks, leading to a multivariate nonparametric procedure suitable for lifetime data. This test accommodates certain types of censoring, making it particularly relevant for life-testing applications. In addition to reviewing the literature on statistically equivalent blocks in hypothesis testing, we compare the proposed approach with some existing multivariate nonparametric methods and discuss its advantages.
Keywords
Nonparametric Testing
Distribution-Free
Multivariate
Life-testing
Statistically Equivalent Blocks
Two-Sample
The two-sample test is a fundamental problem in statistics with a wide range of applications. In high-dimensional settings, graph-based methods have gained considerable attention for their flexibility and minimal distributional assumptions. However, their performance is highly sensitive to tuning parameters, such as the choice of k and the norm used in k-MST construction. To address this challenge, we propose a novel data-driven approach that adaptively selects both k and the appropriate norm, enabling the test statistic to construct similarity graphs that more effectively capture distributional differences. Our method consistently outperforms existing graph-based tests with recommended parameter choices and other adaptive methods across a broad range of scenarios.
Keywords
High dimensional statistics
Graph-based method
Adaptive method
Two-sample hypothesis testing for large graphs is popular in cognitive science, probabilistic machine learning, and artificial intelligence. While numerous methods have been proposed in the literature to address this problem, less attention has been devoted to scenarios involving graphs of unequal size or situations where there are only one or a few samples of graphs. In this article, we propose a Frobenius test statistic tailored for small sample sizes and unequal-sized random graphs to test whether they are generated from the same model or not. Our approach involves an algorithm for generating bootstrapped adjacency matrices from estimated community-wise edge probability matrices, forming the basis of the Frobenius test statistic. We derive the asymptotic distribution of the proposed test statistic and validate its stability and efficiency in detecting minor differences in underlying models through simulations. Furthermore, we explore its application to fMRI data where we are able to distinguish brain activity patterns when subjects are exposed to sentences and pictures for two different stimuli and the control group.
Keywords
Two-sample hypothesis test
Random graphs
Asymptotic normality
Bootstrap
fMRI data
Co-Author(s)
Kit Chan, Bowling Green State University Statistics Committee
Ian Barnett, University of Pennsylvania
Riddhi Ghosh, Bowling Green State Universty
First Author
Xin Jin, The University of Tampa
Presenting Author
Xin Jin, The University of Tampa
Mixture models are invaluable tools for density estimation and clustering tasks. After obtaining a partition of responses by the mixture model, assessing the dependence of the partition on covariates is of great importance. This is particularly relevant in applications where understanding the influence of covariates on clusters or subpopulations is crucial, such as in precision medicine for targeted interventions. In this context, we propose the use of the underlap coefficient as a metric for measuring the dependence of estimated partitions on covariates in cluster analysis. Initially designed to quantify separation between distributions, we posit that the underlap coefficient can also serve as an effective complement to posterior predictive checks when using mixture models for clustering purposes. While the posterior predictive check can identify model inadequacies, the underlap offers insights into where to make model adjustments, particularly whether or not to allow weights to depend on covariates. We further propose Bayesian estimators to accurately estimate the underlap coefficient for this task.
Keywords
Mixture models
Cluster analysis
Covariate dependence
Partition
Underlap coefficient