Evaluating Dimension Reduction Techniques for Linear and Nonlinear Data Structures

Maryam Skafyan Co-Author
East Tennessee State Univesity
 
Mostafa Zahed First Author
East Tennessee State University
 
Mostafa Zahed Presenting Author
East Tennessee State University
 
Tuesday, Aug 5: 8:50 AM - 9:05 AM
1496 
Contributed Papers 
Music City Center 
Dimension reduction techniques play a significant role in analyzing high-dimensional data, especially in fields like radiomics, where extracting meaningful patterns from complex datasets is essential. This study evaluates the performance of Principal Component Analysis (PCA), Isomap, and t-Distributed Stochastic Neighbor Embedding (t-SNE) in preserving data structure based on average silhouette scores. Through extensive simulations, we compare these methods across datasets with varying sample sizes (n = 100, 200, 300, 400, 500), noise levels (σ² = 0.25, 0.5, 0.75, 1, 1.5, 2), and feature counts (p = 20, 50, 100, 200, 300, 400). Our findings indicate that for datasets with an underlying linear structure, PCA achieves the highest accuracy in maintaining cluster integrity, as measured by the average silhouette score. Conversely, for nonlinear data structures, Isomap and t-SNE outperform PCA in preserving meaningful relationships.
One important application of these findings is in radiomics, where high-dimensional imaging data is used to extract quantitative biomarkers for cancer diagnosis and prognosis.

Keywords

Dimension Reductions Techniques

Linear and Nonlinear Data Structures

Radiomics

Principal Component Analysis (PCA)

Isomap

t-Distributed Stochastic Neighbor Embedding (t-SNE) 

Main Sponsor

Section on Statistical Computing