Thursday, Aug 7: 8:30 AM - 10:20 AM
4204
Contributed Papers
Music City Center
Room: CC-209B
Main Sponsor
Section on Statistical Computing
Presentations
Traditional MFA models, which rely on Gaussian assumptions, are sensitive to outliers and heavy-tailed distributions, making them less robust in complex real-world scenarios. The Mixture of t-Factor Analyzers (MtFA) model extends this framework by incorporating multivariate t-distributions, offering improved robustness to non-Gaussian data. Despite its advantages, the MtFA model faces computational challenges, particularly in high-dimensional settings, where the estimation of large covariance matrices and the iterative nature of Expectation-Maximization (EM) algorithms lead to scalability issues. In this work, we present a hybrid approach that integrates a matrix-free algorithm into the EM framework to efficiently estimate the parameters of the MtFA model. By leveraging the structure of the t-distribution within a factor analysis framework, our method retains the interpretability of traditional MFA while improving robustness to heavy-tailed noise and localized anomalies. We demonstrate the effectiveness of our approach through simulations and real-world datasets, showcasing its superior computational efficiency, resilience against outliers, while preserving clustering accuracy.
Keywords
Mixture of factor analyzers
data clustering
matrix-free computations
expectation-maximization algorithm
dimensionality reduction
factor analysis
Co-Author
Fan Dai, Michigan Technological University
First Author
Kazeem Kareem, Michigan Technological University
Presenting Author
Kazeem Kareem, Michigan Technological University
Due to its flexibility in handling skewness, the family of gamma distributions is applicable to numerous domains where less flexible distributions prove inadequate. This paper extends gain-probability (G-P) analysis to the family of gamma distributions, providing a comprehensive investigation of its applicability in statistical modeling. G-P analyses are developed for both independent and dependent (matched) data scenarios. Monte Carlo studies demonstrate the stability and robustness of maximum likelihood estimators of parameters in gamma distributions within the G-P framework. Furthermore, applications to real-world streamflow data highlight the comparative advantages of G-P analysis using the gamma distribution family. To facilitate practical implementation, free online calculators are provided for computing gain probabilities under the proposed methodology.
Keywords
gamma distribution
gain-probability analysis
statistical modeling
maximum likelihood estimator
Monte Carlo studies
streamflow data
Based on previous research featuring generalized distributions, we propose an extension to both generalized skew normal distributions introduced Kumar and Anusree (2011) and skew flexible normal distributions proposed by Gómez et al. (2011). The properties of this family of distributions are explored, and the parameters are estimated using the maximum likelihood method. Two simulation studies are conducted, along with two real data examples, to demonstrate the primary findings.
Keywords
flexibility
bimodal
skew normal
asymmetric
Fréchet regression has emerged as a promising approach for modeling non-Euclidean response variables associated with Euclidean covariates. In this talk, we propose an estimation method with low-rank regularization for global Fréchet regression models. Specifically focusing on distribution function responses, we demonstrate how this framework employs low-rank regularization to enhance the efficiency and accuracy of the model fit. The proposed method enables more robust modeling and estimation, particularly in high-dimensional settings. We present a detailed theoretical analysis of the large-sample properties of the proposed estimator. Numerical experiments further validate these theoretical results.
Keywords
Fréchet regression
Low-rank regularization
Distribution function responses
Quantile function responses
Wasserstein space
Optimal transport
This study proposes new families of generalized inverse Pareto distributions using the T-R{Y} framework. Several
choices for the distributions of the random variables T and Y lead to generalized families of the random variable R,
which, in this study, is characterized by the inverse Pareto distribution. The generalized family of distributions is
thus named as T-inverse Pareto{Y} family. We consider the exponential, Weibull, log-logistic, logistic, Cauchy, and
extreme value distribution as potential choices for the distribution of the random variable Y . Specific members of
the T-inverse Pareto{Y} family exhibit symmetric, skewed to the right, skewed to the left, unimodal, or bimodal
density functions. Some statistical properties of the T-inverse Pareto{Y} family are investigated. The method of
maximum likelihood is proposed for estimating the distribution parameters and its performance is assessed using
a simulation study. Four real-world datasets from different disciplines are analyzed to demonstrate the flexibility of the
proposed T-inverse Pareto{Y} family of distributions.
Keywords
T-R{Y} framework
Inverse Pareto distribution
Quantile function
Maximum likelihood estimation
Censoring
Today, data mining and gene expressions are at the forefront of modern data analysis. In this paper, we present a revised and corrected version of the spherical-Dirichlet distribution, originally introduced by the same author. This updated formulation addresses key issues in the original development while maintaining the core structure and motivation behind the distribution. The spherical-Dirichlet distribution is designed to model vectors constrained to the positive orthant of the hypersphere, thereby eliminating unnecessary probability mass. We provide a thorough analysis of the distribution's fundamental properties, including updated normalizing constants and moments. Relationships with other distributions are further explored. Estimators based on classical inferential statistics, such as the method of moments and maximum likelihood estimation, are derived. To illustrate the impact of these corrections, we apply the revised distribution to two examples: one with simulated data and another using a real text mining dataset, mirroring the approach in the original work. The results highlight the improvements and practical implications of the proposed modifications.
Keywords
Dirichlet Distribution
Probability Distributions
Hypersphere
Positive Quadrant
Data Mining
Spherical Dirichlet