Developments in Statistical Graphics

Sayan Chakrabarty Chair
University of Michigan
 
Tuesday, Aug 5: 10:30 AM - 12:20 PM
4106 
Contributed Papers 
Music City Center 
Room: CC-214 

Main Sponsor

Section on Statistical Graphics

Presentations

Clav: R package and Shiny application for cluster analysis validation

Cluster analysis is a statistical procedure for grouping observations using an observation-centered approach as compared to variable-centric approaches (e.g. PCA, factor analysis). Whether a preprocessing step for predictive modeling or the primary analysis, validation is critical for determining generalizability across datasets. Theodoridis and Koutroumbas (2008) identified three broad types of validation for cluster analysis: 1) Internal cluster validation, 2) Relative cluster validation, and 3) External cluster validation. Strategies for steps 1 and 2 are well established, however cluster analysis is typically an unsupervised learning method where there is no observed outcome. Ullman et al (2021) proposed an approach to validating a cluster solution by visually inspecting the cluster solutions across a training and validation dataset. This talk introduces the clav R package that implements and expands this approach by generating multiple random samples (using either a simple random split or bootstrap samples). Visualizations of both the cluster profiles as well as distributions of the cluster means are provided along with a Shiny application to assist the researcher. 

Keywords

cluster analysis

validation

R package

Shiny application 

First Author

Jason Bryer

Presenting Author

Jason Bryer

Cultural Influences on Color Choices in Statistical Graphics

This paper investigates color preferences in statistical graphics across different cultural and regional contexts. Participants from three countries-Saudi Arabia, the USA, and India-were surveyed to explore how cultural background influences their choice of warm or cool colors in data visualizations. Each participant was presented with a series of statistical graphics and asked to indicate their preferred color schemes. Preliminary findings reveal significant associations between color preferences and nationality in certain types of graphics, suggesting that cultural factors may play a crucial role in shaping visual perception and interpretation. These insights have important implications for designing culturally adaptive and inclusive data visualizations. 

Keywords

Graphics Color

Color Preferences

Statistical Graphics

Cultural Influences

Data Visualization 

Co-Author

Abdulrahman Alruways, University of Nebraska Omaha

First Author

Mahbubul Majumder, University of Nebraska at Omaha

Presenting Author

Mahbubul Majumder, University of Nebraska at Omaha

Dimension Reduction of Legislative Roll Call Votes

Roll call votes taken in democratic legislative bodies offer a data-rich resource through which to examine political behavior using a quantitative lens. This work explores the performance of different dimensionality reduction techniques when used to assess the relationships among legislators. In particular, expressions of ideological positioning, partisan polarization, and intraparty clusters are compared, as are the stability and robustness of different approaches. Features affecting the embedding, including missed votes and partisan control of the chamber, are also discussed. Data sets are drawn from the Americas and Europe. 

Keywords

dimension reduction

voting behavior

data visualization 

First Author

Jonathan Fischer

Presenting Author

Jonathan Fischer

Estimation of interference effects in networks with community structures

In causal inference, the interference effect – whether an individual's outcome is affected by the treatment of its neighbors – is gaining increasing attention. The majority of existing work assumes that the observed networks represent the true underlying interference networks. In practice, this assumption is not correct and leads to the bias in the estimation of causal effects. In this work, we address the problem of whether true interference effects exist given the observed networks. In particular, our proposed framework leverages the community structures in the networks and assumes the interference effects are identically distributed for individuals in the same community. We demonstrate that our proposed model is able to identify the interference effects in theory and in simulations. We apply our proposed framework to the stroke encounter data and evaluate the potential effect of performing EVT procedures in one hospital on its neighbors. 

Keywords

Interference Effect

Community Structure

Causal Inference 

Co-Author(s)

Ruoyu Wang, Harvard University
Shuo Sun, Harvard T.H. Chan School of Public Health
Jukka-Pekka Onnela

First Author

Yuhua Zhang, Harvard University

Presenting Author

Yuhua Zhang, Harvard University

WITHDRAWN Exudate Chemodiversity in Root Economics Spectrum: Multi-Matrix PCA with Uncertainty Quatification

Traditional root economics spectrum analysis reduces morphological traits (e.g., diameter, SRL) via PCA but excludes critical chemical dimensions of root exudates due to analytical challenges. We address two gaps: (1) integrating high-dimensional metabolomic data (>1000 compounds per sample) into PCA when compounds must first be characterized by chemical features (e.g., aromaticity, redox potential); (2) handling isomer uncertainty, where identical compound names mask divergent properties. We propose a nested matrix factorization approach: converting raw metabolomic matrices (compounds × mass) into feature-based matrices using quantum-chemical descriptors (e.g., logP, H-bond donors) via PaDEL-Descriptor; using STATIS co-inertia analysis to jointly project morphological traits (root × morphology) and chemical feature blocks (root × compound × feature) into a unified PCA space; and computing probabilistic chemical descriptors as weighted averages across isomers, with weights estimated via Bayesian multinomial regression using PubChem priors. Our probabilistic multi-matrix PCA advances functional trait ecology by addressing high-dimensionality and ambiguity, improving RES analysis. 

Keywords

Root economics spectrum

Chemical descriptor

Multi-block PCA integration

Isomer uncertainty propagation 

Co-Author

Lijuan Sun, Lanzhou University

First Author

Xinyao Yang, Xi'an Jiaotong Liverpool University

The robustness of the Ud-plot on assessing normality

The Ad-plot developed from the cumulative average deviation function efficiently detects numerous distributional characteristics, including symmetry, skewness, and outliers analogous to sample variance plots that outperform histograms. In the meantime, the Ud-plot derived from a slight modification to the Ad-plot is outstanding in assessing normality, surpassing normal QQ-plot, normal PP-plot, and their derivations.  In this work, the robustness of the Ud-plot is explored while employing the trimmed average in the cumulative average deviation function. From the standpoint of assessing normality, the robust version is as exceptional as the Ud-plot. For actual and simulated data, the performance of the novel substitute is compared with the Ud-plot. Markedly, this version is extremely competitive in assessing normality and capturing vital distribution properties. Thus, the innovative statistical plot is a noteworthy addition to data visualization implements, delivering insightful illustrations while enhancing perception. In addition, the adplots R package will also be introduced to construct Ad-plot and Ud-plot. 

Keywords

Ad-plot

Ada-plot

Ud-plot

Uda-plot

Assessing Normality

Data Visualization 

First Author

Uditha Wijesuriya, University of Southern Indiana

Presenting Author

Uditha Wijesuriya, University of Southern Indiana

Visualizing U.S. Cyberinfrastructure Services: An Interactive Tool for Better Decision-Making

To promote scientific discovery through high-performance computing, the U.S. National Science Foundation has funded several cyberinfrastructure programs including TeraGrid, XSEDE, and ACCESS. Since 2003, these programs have generated much data on awarded projects, resource allocation, and used amount. This work analyzed the resource usage data and developed an interactive Shiny dashboard to help service providers improve their services and assist new users in identifying potential collaborators. 

Keywords

Interactive visualization

Shiny dashboard

decision-making

cyberinfrastructure 

Co-Author(s)

Yu-Che Chen, University of Nebraska at Omaha
Richard Knepper, Cornell University

First Author

Xiaoyue Cheng, University of Nebraska at Omaha

Presenting Author

Xiaoyue Cheng, University of Nebraska at Omaha