Tuesday, Aug 5: 10:30 AM - 12:20 PM
4106
Contributed Papers
Music City Center
Room: CC-214
Main Sponsor
Section on Statistical Graphics
Presentations
Cluster analysis is a statistical procedure for grouping observations using an observation-centered approach as compared to variable-centric approaches (e.g. PCA, factor analysis). Whether a preprocessing step for predictive modeling or the primary analysis, validation is critical for determining generalizability across datasets. Theodoridis and Koutroumbas (2008) identified three broad types of validation for cluster analysis: 1) Internal cluster validation, 2) Relative cluster validation, and 3) External cluster validation. Strategies for steps 1 and 2 are well established, however cluster analysis is typically an unsupervised learning method where there is no observed outcome. Ullman et al (2021) proposed an approach to validating a cluster solution by visually inspecting the cluster solutions across a training and validation dataset. This talk introduces the clav R package that implements and expands this approach by generating multiple random samples (using either a simple random split or bootstrap samples). Visualizations of both the cluster profiles as well as distributions of the cluster means are provided along with a Shiny application to assist the researcher.
Keywords
cluster analysis
validation
R package
Shiny application
This paper investigates color preferences in statistical graphics across different cultural and regional contexts. Participants from three countries-Saudi Arabia, the USA, and India-were surveyed to explore how cultural background influences their choice of warm or cool colors in data visualizations. Each participant was presented with a series of statistical graphics and asked to indicate their preferred color schemes. Preliminary findings reveal significant associations between color preferences and nationality in certain types of graphics, suggesting that cultural factors may play a crucial role in shaping visual perception and interpretation. These insights have important implications for designing culturally adaptive and inclusive data visualizations.
Keywords
Graphics Color
Color Preferences
Statistical Graphics
Cultural Influences
Data Visualization
Roll call votes taken in democratic legislative bodies offer a data-rich resource through which to examine political behavior using a quantitative lens. This work explores the performance of different dimensionality reduction techniques when used to assess the relationships among legislators. In particular, expressions of ideological positioning, partisan polarization, and intraparty clusters are compared, as are the stability and robustness of different approaches. Features affecting the embedding, including missed votes and partisan control of the chamber, are also discussed. Data sets are drawn from the Americas and Europe.
Keywords
dimension reduction
voting behavior
data visualization
In causal inference, the interference effect – whether an individual's outcome is affected by the treatment of its neighbors – is gaining increasing attention. The majority of existing work assumes that the observed networks represent the true underlying interference networks. In practice, this assumption is not correct and leads to the bias in the estimation of causal effects. In this work, we address the problem of whether true interference effects exist given the observed networks. In particular, our proposed framework leverages the community structures in the networks and assumes the interference effects are identically distributed for individuals in the same community. We demonstrate that our proposed model is able to identify the interference effects in theory and in simulations. We apply our proposed framework to the stroke encounter data and evaluate the potential effect of performing EVT procedures in one hospital on its neighbors.
Keywords
Interference Effect
Community Structure
Causal Inference
Traditional root economics spectrum analysis reduces morphological traits (e.g., diameter, SRL) via PCA but excludes critical chemical dimensions of root exudates due to analytical challenges. We address two gaps: (1) integrating high-dimensional metabolomic data (>1000 compounds per sample) into PCA when compounds must first be characterized by chemical features (e.g., aromaticity, redox potential); (2) handling isomer uncertainty, where identical compound names mask divergent properties. We propose a nested matrix factorization approach: converting raw metabolomic matrices (compounds × mass) into feature-based matrices using quantum-chemical descriptors (e.g., logP, H-bond donors) via PaDEL-Descriptor; using STATIS co-inertia analysis to jointly project morphological traits (root × morphology) and chemical feature blocks (root × compound × feature) into a unified PCA space; and computing probabilistic chemical descriptors as weighted averages across isomers, with weights estimated via Bayesian multinomial regression using PubChem priors. Our probabilistic multi-matrix PCA advances functional trait ecology by addressing high-dimensionality and ambiguity, improving RES analysis.
Keywords
Root economics spectrum
Chemical descriptor
Multi-block PCA integration
Isomer uncertainty propagation
The Ad-plot developed from the cumulative average deviation function efficiently detects numerous distributional characteristics, including symmetry, skewness, and outliers analogous to sample variance plots that outperform histograms. In the meantime, the Ud-plot derived from a slight modification to the Ad-plot is outstanding in assessing normality, surpassing normal QQ-plot, normal PP-plot, and their derivations. In this work, the robustness of the Ud-plot is explored while employing the trimmed average in the cumulative average deviation function. From the standpoint of assessing normality, the robust version is as exceptional as the Ud-plot. For actual and simulated data, the performance of the novel substitute is compared with the Ud-plot. Markedly, this version is extremely competitive in assessing normality and capturing vital distribution properties. Thus, the innovative statistical plot is a noteworthy addition to data visualization implements, delivering insightful illustrations while enhancing perception. In addition, the adplots R package will also be introduced to construct Ad-plot and Ud-plot.
Keywords
Ad-plot
Ada-plot
Ud-plot
Uda-plot
Assessing Normality
Data Visualization
To promote scientific discovery through high-performance computing, the U.S. National Science Foundation has funded several cyberinfrastructure programs including TeraGrid, XSEDE, and ACCESS. Since 2003, these programs have generated much data on awarded projects, resource allocation, and used amount. This work analyzed the resource usage data and developed an interactive Shiny dashboard to help service providers improve their services and assist new users in identifying potential collaborators.
Keywords
Interactive visualization
Shiny dashboard
decision-making
cyberinfrastructure