Skip to content

ggduo: pairs plots for multiple regression, cca, time series

Dianne Cook edited this page Mar 8, 2016 · 15 revisions

Background

The function ggpairs and ggscatmat in GGally provide generalized pairs plots for a data frame in R. All pairs of variables are displayed, with plot defaults depending on the type of variable in a matrix format. The diagonal contains univariate displays. These functions extend the classic pairs function in base R, which only handles real-valued variables, to flexibly handle different variable types, and to use the graphics package ggplot2.

This is appropriate for multivariate data, because we want to see each variable vs each other. But in many problems, such as regression, or multiple time series, there are two groups of variables, e.g. response variables and explanatory variables, and we would like to see one group vs the other group. New functions are needed to accomplish this.

Related work

GGally description.

Details of your coding project

The outcomes of the project are:

  • R package for generalized version of pairs plots implemented

Expected impact

Mentors

Once you have a solution to the medium or/and the hard problem, please get in touch with Dianne Cook.

Tests

Several tests that potential students can do to demonstrate their capabilities for this particular project. Please modify the suggestions below to make them specific for your project.

  • Easy: Install the GGally package from github (you might have to install the devtools package first). Run one of the examples, put the chart in a knitr/Rmarkdown document and write a paragraph to explain the chart.
  • Medium: Merge two ggmatrix objects, and produce a new ggmatrix object
  • Hard: Present all ggmatrix objects as a facetted ggplot object, rather than an ad hoc print. Make a pairs plot of the 4 variable iris data with strip labels at the top and side to illustrate that it is accomplished.

Solutions of tests

Students, please post a link to your test results here.

References

  • Emerson, John W., Walton A. Green, Barret Schloerke, Di Cook, Heike Hofmann, and Hadley Wickham (2012). “The Generalized Pairs Plot.” Journal of Computational and Graphical Statistics, 22 (1), 79-91; doi: 10.1080/10618600.2012.694762.
  • Wickham H., ggplot2: Elegant graphics for data analysis. useR, Springer, July 2009.
  • Wilkinson L., The Grammar of Graphics. Statistics and Computing, Springer, 1999.
  • multiple regression
  • multiple time series
Clone this wiki locally