Alluvial plots: A paradigm of intermediacy
Tuesday, Aug 5: 9:35 AM - 9:55 AM
Topic-Contributed Paper Session
Music City Center
General-purpose statistical graphics software can be organized according to multiple paradigms: Whereas the standard R distribution provides single-use functions to render recognizable graphical types like histograms and case–variable biplots, ggplot2 enacted a grammatical approach that decoupled such design choices as coordinate systems, statistical transformations, and graphical representations. The rigidity of the grammatical approach requires radical adaptation for novel data structures, for example monadic (tabular) versus network data, and its distributivity enables groupings and panelings of unentangled data subsets.
Yet the popularity of this grammar is nearly matched by that of its non-grammatical reverse-dependencies, the polarizing menagerie of many-parameter wrappers, suggesting an unmet need in the taxonomy of statistical graphics. My goal with this presentation is not to meet this need but to pitch a useful foil.
I submit alluvial plots as a simple but essential challenge to the above dichotomies: type versus grammar, monadic versus network, and distributive versus entangled. The ggplot2 extension ggalluvial emerged from years of trial and error, critical user feedback, and haphazard familiarization with important related work. Its current stable form is situated on a new branch in an original taxonomy of width-encoded flow diagrams that uses the layered grammar of graphics to both constrain the space of graphical types and expand the scope, in terms of data structures and graphical representations, of those produced.
Alluvial plots interrupt neat divides between paradigms of data visualization: They are highly idiomatic and can be grammaticalized only through the use of myriad positional choices (whether or not these are exposed to the user). They represent either longitudinal or dyadic data ("id–key–value pairs"), intermediaries between classical tabular data and pairwise network data. And they resist composition with groupings on data that cross both IDs and keys, though such information may be essential to visualize. Nevertheless, once they are shoehorned into a rigid grammar, they introduce a new subspace of types that has yet to be thoroughly explored. Most notably, by vertically stacking rather than gapping value groups, alluvial plots reclaim the ruled ordinate (the y-axis) and consequently offer novel encodings of cumulative weight, signed categories, and loss (and gain) to follow-up.
You have unsaved changes.