Skip to content

Rperform: Performance analysis and visualization for R

Akash Tandon edited this page Apr 4, 2017 · 23 revisions

Background

There exists various tools and software to help developers across different languages test the performance of their code. This analysis can be performed in terms of quantifiable metrics such as time, memory, etc. Rperform aims to be a standard tool for performance testing R packages and code, in general. Rperform had started as a GSoC 2015 project and was again accepted in GSoC 2016. From its README,

Rperform is a package that allows R developers to track quantitative performance metrics of their code. It focuses on providing changes in a package’s performance metrics, related to runtime and memory, over different git versions and across git branches. Rperform can be integrated with Travis-CI to do performance testing during Travis builds by making changes to the repo's .travis.yml file. It can prove to be particularly useful while measuring the possible changes which can be introduced by a pull request (PR).

Related work

Note: Kirill Müller has done some handy work for benchmarking dplyr from which inspiration can be taken.

Tools employing visualization and version control:

  • airspeedvelocity (asv): A tool for benchmarking Python packages over their lifetime. The results are displayed in an interactive web frontend that requires only a basic static webserver to host.
  • vbench: A lightweight Python library to catch performance regressions. It integrates with git to run performance benchmarks for every revision of source repository, persisting the results in SQLite and generating graphs with matplotlib.
  • codespeed

Tools employing visualization:

  • snakeviz: A browser based graphical viewer for the output of Python’s cProfile module.
  • dotTrace (Not open source): Performance profiler for .NET apps. Returns a very detailed analysis.
  • Prophiler: A PHP profiler & developer toolbar built for phalcon web framework. Good UI.

Other examples: stats.js, benchmark by Google

Rperform is a first-of-its-kind package for detailed performance analysis of R packages.

Various tools are available for profiling R code such as lineprof, Rprof, proftools, summaryRprof, etc. These tools have limitations which make them unsuitable for performing relatively large-scale code performance analysis, which is required by package developers.

Tools such as Rprof, R's sampling/statistical profiler, stops the execution of code at regular intervals (typically few milliseconds or seconds). It then records which function or line of code is currently being executed. In this manner, a memory/time profile is built. summaryRprof and proftools are examples of packages which help summarize the output from Rprof. profvis is an interesting tool currently in development for visualizing code profiling data as well.

But such profiling methods make profiling a time-consuming and strenuous task for package developers when faced with multiple files across directories. Moreover, there's no simple way to compare across different versions of a package's git repository. Rperform overcomes these limitations and provides additional functionalities such as visualization and integration with Travis-CI builds. It builds on top of the testthat framework allowing developers to use testthat unit tests for performance analysis in addition to correctness testing if they want.

Details of your coding project

There are several focus areas which must ideally be worked upon on this project. Your proposal and timeline must take the following into consideration:

  • Allow for efficient and easier performance testing of standalone files and commits; make performance testing of package more intuitive and easier.

  • Improvement of the visualization functions: One of the most prominent and helpful feature of Rperform are the visualization functions it provides. Here are plots generated by Rperform after analyzing the runtime performance of tests from the iregnet package.

Rperform testing on iregnet

Rperform testing on iregnet 2

Images from http://rovervan.com/post/gsoc/iregnet (speed improvements in the optimize branch of iregnet) are a convincing example of the effectiveness of performance testing plots. E.g. The iregnet plots show a clear reduction in times over the two commits in the optimize branch.

Details about the same can be found on the Rperform Wiki. The visualization functions need to be improved upon and made interactive using packages such as animint. The direction this aspect of the project takes will heavily depend on how the UI implementation is done (see below).

Associated issue: https://github.com/analyticalmonk/Rperform/issues/15

  • Provision to create and maintain database of metrics: There should be a method to create and maintain a database of the results obtained using Rperform at various points of time. This would not only be helpful to the users but would also help in identifying inconsistencies in Rperform's measurements, if any. Inclusion of such a database in a UI (next objective) would be very useful.

Associated issue: https://github.com/analyticalmonk/Rperform/issues/34

  • Changepoint detection: Include feature to perform changepoint detection in context of a code-base's performance. Something like the changepoint package could be used.

Associated issue: https://github.com/analyticalmonk/Rperform/issues/35

  • Creating a coherent and useful user interface: Currently, Rperform does have a function which generates a webpage comprised of multiple plots obtained after analyzing the package code. However, there is a need for development of a proper user interface for the package developer to be able to interact meaningfully with the results. Inspiration for the same can be taken from projects such as asv and codespeed. Some potentially useful features are:
    • Option to display plots for various files using a dropdown menu. Also, being able to display multiple plots at the same time for comparison purposes.
    • Function to find a commit that produces a large regression (inspired from asv).
    • Make the plots interactive. Hovering over a datapoint provides details about the commit such as date, author, etc.

Associated issue: https://github.com/analyticalmonk/Rperform/issues/33

  • Improve test suite and increase code coverage: Rperform currently lacks adequate number of unit tests. With increased functionality, working on this aspect will become even more important.

  • Make package CRAN-ready: The package, at the end of the project, should pass R CMD CHECK.

Associated issue: https://github.com/analyticalmonk/Rperform/issues/16

  • Run Rperform on popular packages: Run Rperform on lots of real packages, so we can show the R community the benefits of using our code.

Expected impact

Mentors, please explain how this project will produce a useful package for the R community.

Mentors

Tests

Brownie points if you can or have made an R package which can pass R CMD check.

Solutions of tests

Clone this wiki locally