Who are the ggplot2 extenders, and how can you become one?

Joyce Robbins Chair
Columbia University
 
Joyce Robbins Organizer
Columbia University
 
Evangelina 'Gina' Reynolds Organizer
Consulant
 
Tuesday, Aug 5: 8:30 AM - 10:20 AM
1632 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-106C 

Applied

Yes

Main Sponsor

Section on Statistical Graphics

Co Sponsors

Section on Statistical Computing
Section on Statistics and Data Science Education

Presentations

Who are the ggplot2 extenders, and how can you become one? An Overview

Since its release, the R package ggplot2 (Wickham 2007) has become a widely used tool for data visualization. Based on the "grammar of graphics" (Wilkinson 1999), ggplot2 provides a flexible framework for creating highly customizable graphics. In addition, it was designed to be extendable; that is, to allow users to create their own grammar components (e.g. themes, stats, geoms, and scales). Consequently, since ggplot2's release nearly two decades ago, the number of extension packages has grown into the hundreds. Some add new functionality via the prescribed extension mechanisms while others introduce variations in dialect into the grammar, tailored to domain-specific needs. This paper attempts to characterize this ecosystem and highlight pathways for individuals to enter the extension space.  

Co-Author(s)

Joyce Robbins, Columbia University
Vivian Zheng, Columbia University

Speaker

Evangelina 'Gina' Reynolds, Consulant

Innovations in aesthetic evaluation semantics: Where ggplot2 users and developers meet

In recent years, the aesthetic evaluation semantics in ggplot2 has evolved into a rich and powerful interface serving the needs of both its users and developers. Taking advantage of R's metaprogramming capabilities, functions like after_stat() and stage() create a shared vocabulary between users and developers that allows them to reason about layer-internal processes at an intermediate level of abstraction. This design pattern exemplifies a paradigm shift in the implementation of the Grammar of Graphics, demonstrating that complexity can be tamed rather than hidden. I demonstrate how ggplot2's innovations in aesthetic evaluation serves a dual purpose: empowering users to grow into developers while enabling developers to create interfaces that evolve with their users. 

Speaker

June Choe, University of Pennsylvania

ggdist: Visualizations of distributions and uncertainty in the grammar of graphics

Speaker

Matthew Kay, University of Michigan

Alluvial plots: A paradigm of intermediacy

General-purpose statistical graphics software can be organized according to multiple paradigms: Whereas the standard R distribution provides single-use functions to render recognizable graphical types like histograms and case–variable biplots, ggplot2 enacted a grammatical approach that decoupled such design choices as coordinate systems, statistical transformations, and graphical representations. The rigidity of the grammatical approach requires radical adaptation for novel data structures, for example monadic (tabular) versus network data, and its distributivity enables groupings and panelings of unentangled data subsets.

Yet the popularity of this grammar is nearly matched by that of its non-grammatical reverse-dependencies, the polarizing menagerie of many-parameter wrappers, suggesting an unmet need in the taxonomy of statistical graphics. My goal with this presentation is not to meet this need but to pitch a useful foil.

I submit alluvial plots as a simple but essential challenge to the above dichotomies: type versus grammar, monadic versus network, and distributive versus entangled. The ggplot2 extension ggalluvial emerged from years of trial and error, critical user feedback, and haphazard familiarization with important related work. Its current stable form is situated on a new branch in an original taxonomy of width-encoded flow diagrams that uses the layered grammar of graphics to both constrain the space of graphical types and expand the scope, in terms of data structures and graphical representations, of those produced.

Alluvial plots interrupt neat divides between paradigms of data visualization: They are highly idiomatic and can be grammaticalized only through the use of myriad positional choices (whether or not these are exposed to the user). They represent either longitudinal or dyadic data ("id–key–value pairs"), intermediaries between classical tabular data and pairwise network data. And they resist composition with groupings on data that cross both IDs and keys, though such information may be essential to visualize. Nevertheless, once they are shoehorned into a rigid grammar, they introduce a new subspace of types that has yet to be thoroughly explored. Most notably, by vertically stacking rather than gapping value groups, alluvial plots reclaim the ruled ordinate (the y-axis) and consequently offer novel encodings of cumulative weight, signed categories, and loss (and gain) to follow-up. 

Speaker

Jason Brunson, University of Florida

ggtime: Visualizing time with a grammar of temporal graphics

Effective use of statistical graphics in exploratory time series analysis helps to uncover temporal patterns needed to accurately specify models. While several commonly used plots exist for visualizing time series, little work has been done to formalize them into a unified grammar of temporal graphics. Decomposing traditional time series graphics such as time plots and seasonal plots into modular grammatical elements provides the flexibility needed to clearly visualize multiple seasonality, cycles, and other complex patterns.

Temporal data visualization requires special handling to highlight patterns shaped by calendar systems, much like the nuances of spatial, graph, and uncertainty visualization. The proposed grammar incorporates calendrical concepts to visually align time points at different granularities and timezones, warp time to standardize irregular cyclical durations, and wrap time into hierarchical calendar layouts. In this talk, I will introduce the grammar of temporal graphics as implemented in the ggtime R package, and demonstrate how these grammatical elements can be combined to create both familiar and novel visualizations of complex time series patterns. 

Co-Author

Cynthia Huang

Speaker

Mitchell O'Hara-Wild, Monash University