Skip to content

Mosaicplots in the ggplot2 framework: ggmosaic

Toby Dylan Hocking edited this page Mar 1, 2016 · 2 revisions

## Background

Categorical variables are omni-present in today’s data, and while there has been a lot of development for visualizations of categorical variables in recent years, graphical methods for categorical data and mixtures of qualitative and quantitative data are not well developed in comparison with what is available for numeric variables. One possibility of visualising multidimensional data are Mosaic plots proposed by Hartigan and Kleiner (1981). Enhanced with interactive features such as querying, re-ordering of variables and variable categories, and grouping of quantitative variables these plots become a very powerful and easy to use tool for analysing and understanding multivariate categorical data. Mosaic plots and, in particular, the one-dimensional spine plots are missing from “`ggplot2“`. While the “`productplots“` package is an implementation using “`ggplot2“` graphics, it does not support the full functionality of “`ggplot2“`, such as e.g. facetting (for a variable not included in the prodplot) or additional layers (to show e.g. the ‘density’ of points within each category). Within “`ggplot2“` using the position=’fill’ option in barcharts comes closest to showing a conditional feature allocation. This is no longer supported in “`qplot“` histogram. With the new “`ggplot2“` version 2.0.0 (or, shortly 2.1.0) the way that geoms are support has been completely overhauled, and makes extensions much easier to write. We are proposing to add a mosaic geom to “`ggplot2“` that allows to make use of the full functionality of ggplot2.

## Related work

Mosaic plots have been implemented in a variety of packages: “`mosaicplot()“` is one of the base graphics in the “`stats“` package, “`mosaic()“` is part of the “`vcd“` package. Also part of the “`vcd“` package is “`strucplot()“`, providing an extension to “`mosaic()“`. “`qmosaic()“` is an interactive implementation of mosaic plots as part of the “`cranvas“` package, and the “`productplots“` package is an implementation based on the “`ggplot2“` framework. Why do we need another implementation? I don’t want to downtalk any of the existing solutions, but there are some unresolved issues in all of them, e.g.

  • default spacing and labels are not quite right in the “`mosaicplot()“` implementation. From a data visualization point of view, it makes sense to make the best use of the space available, and the (default) spacing choices are not doing that.
  • In the “`vcd“` implementation, there are some unintuitive ways, the formula gets resolved, e.g. “`mosaic(Improved ~ Treatment | Sex, data = Arthritis, zero_size = 0, highlighting_direction = “right”)“` gives the same result as “`mosaic(Improved ~ Treatment + Sex, data = Arthritis, zero_size = 0, highlighting_direction = “right”)“`. Statistically, not the same things are shown, and the chart should reflect that.
  • the “`qmosaic“` implementation in “`cranvas“` allows very powerful interactions with the chart, but the dependency on Qt makes “`cranvas“` very hard to install (besides a specific version of Qt with tricky paths, it also needs both the “`qtbase“` and the “`qtpaint“` package to work)
  • the implementation of the “`productplots“` package comes the closest to the envisioned result of this project. However, “`prodplot“` is functionality on top of the “`ggplot2“` package and not integrated with it as a “`geom“`, which makes it impossible to use additional “`ggplot2“` tools such as facetting and layering except in very special cases.

## Details of your coding project

With version 2.0.0 of the “`ggplot2“` package the handling of geoms was completely revised, which makes the handling of user defined geoms much more straightforward and compliant with the remainder of the “`ggplot2“` framework. This extension is based on “`ggproto“`, which operates at the interface between “`ggplot2“` and the more general “`proto“` package. We envision that the student will be using this approach and together with the main functionality of the “`productplots“` package (such as the calculation routines, formatting, divider handling, …) create an interface that allows an integration of mosaicplots as a geom. Because mosaicplots benefit greatly from additional user driven interactivity, a second part of this project is the creation of a shiny app that allows users to specify and change aspects of a mosaicplot interactively.

The outcomes of the project are:

  • R package for generalized version of mosaic plots implemented as a “`geom“` for the “`ggplot2“` package based on “`ggproto“`. The package has to be fully functional and must be documented.
  • A set of examples documenting the use and flexibility of “`geom_mosaic“`.
  • A shiny app highlighting the mosaicplot functionality interactively, to allow users to specify parameters and see the impact immediately to allow them to familiarize to the more abstract concepts of mosaicplots.

## Expected impact

I don’t want to dissuade anybody from using their package of choice when drawing mosaic plots. Much rather do we want to reach the wider community of ggplot2 users to draw mosaic plots.

## Mentors

Once you have a solution to the medium or/and the hard problem, please get in touch with Heike Hofmann <hofmann@iastate.edu> and/or Dianne Cook.

## Tests

Several tests that potential students can do to demonstrate their capabilities for this particular project. Please modify the suggestions below to make them specific for your project.

  • Easy: Install the productplots package from github (you might have to install the devtools package first). Run one of the examples, put the chart in a knitr/Rmarkdown document and write a paragraph to explain the chart.
  • Medium: write a shiny app that shows a mosaicplot (using “`prodplot“`) of a few variables and allows to interactively change at least one aspect of the mosaic.
  • Hard: based on Hadley Wickham’s introduction to extending “`ggplot2“` write a function that implements a geom of your choice. Document the function using Roxygen, and include it into an R package.

## Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally