Skip to content

Commit 37e4c88

Browse files
psolymosalexellis
authored andcommitted
Add R template post
Signed-off-by: Peter Solymos <psolymos@gmail.com>
1 parent a2caa78 commit 37e4c88

File tree

5 files changed

+240
-0
lines changed

5 files changed

+240
-0
lines changed

_posts/2021-02-26-r-templates.md

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
title: "Functions for data science with R templates for OpenFaaS"
3+
description: "Let's bring R to the cloud! Use the power of R for data science serverless-style."
4+
date: 2021-02-26
5+
image: /images/2021-02-r/background.jpg
6+
categories:
7+
- kubernetes
8+
- r
9+
- plumber
10+
author_staff_member: peter
11+
dark_background: true
12+
---
13+
14+
Let's bring R to the cloud! Use the power of R for data science serverless-style.
15+
16+
## Introduction
17+
18+
[R](https://www.r-project.org/) is one of the most popular languages for data science. R's strength is in _statistical computing_ and _graphics_. Its use is most prominent in disciplines relying on classical statistical approaches, such as environmental sciences, public health, finance, just to mention a few. In this post first I will introduce you to the R templates for OpenFaaS. Then I will build a function that pulls data from a COVID-19 API, fits a time series model to the data, and makes a forecast for the future case counts.
19+
20+
> This post is written for existing OpenFaaS users, if you're new then you should [try deploying OpenFaaS](https://docs.openfaas.com/deployment/) and following a tutorial to get a feel for how everything works. Why not start with this course? [Introduction to Serverless course by the LinuxFoundation](https://www.openfaas.com/blog/introduction-to-serverless-linuxfoundation/)
21+
22+
### The R templates
23+
24+
Use the [`faas-cli`](https://github.com/openfaas/faas-cli) and pull R templates:
25+
26+
```bash
27+
faas-cli template pull https://github.com/analythium/openfaas-rstats-templates
28+
```
29+
30+
Now `faas-cli new --list` should give you a list with the available R/rstats templates to choose from (rstats refers to the Twitter hashtag used for R related posts). The templates differ with respect to the Docker base image, the OpenFaaS watchdog type, and the server framework used.
31+
32+
You can choose between the following base images:
33+
34+
- Debian-based `rocker/r-base` Docker image from the [rocker](https://github.com/rocker-org/rocker/tree/master/r-base) project for bleeding edge,
35+
- Ubuntu-based `rocker/r-ubuntu` Docker image from the [rocker](https://github.com/rocker-org/rocker/tree/master/r-ubuntu) project for long term support (uses [RSPM](https://packagemanager.rstudio.com/client/) binaries for faster R package installs),
36+
- Alpine-based `rhub/r-minimal` Docker image from the [r-hub](https://github.com/r-hub/r-minimal) project for smallest image sizes.
37+
38+
> The use of Docker with R is discussed in the original article introducing the [Rocker](https://journal.r-project.org/archive/2017/RJ-2017-065/RJ-2017-065.pdf) project and also in a recent review of the [Rockerverse](https://journal.r-project.org/archive/2020/RJ-2020-007/RJ-2020-007.pdf).
39+
40+
The template naming follows the pattern `rstats-<base_image>-<server_framework>`. Templates without a server framework (e.g. `rstats-base`) use the classic [watchdog](https://github.com/openfaas/faas/tree/master/watchdog) which passes in the HTTP request via STDIN and reads a HTTP response via STDOUT. The other templates use the he HTTP model of the [of-watchdog](https://github.com/openfaas-incubator/of-watchdog) that provides more control over your HTTP responses and is more performant due to caching and pre-loading data and libraries.
41+
42+
R has an ever increasing number of server frameworks available. There are templates for the following frameworks (R packages): [httpuv](https://CRAN.R-project.org/package=httpuv), [plumber](https://www.rplumber.io/), [fiery](https://CRAN.R-project.org/package=fiery), [beakr](https://CRAN.R-project.org/package=beakr), [ambiorix](https://ambiorix.john-coene.com/). Each of these frameworks have their own pros and cons for building standalone applications. But for serverless purposes, the most important aspect of picking one comes down to support and ease of use.
43+
44+
In this post I focus on the [plumber](https://www.rplumber.io/) R package and the `rstats-base-plumber` template. Plumber is one of the oldest of these frameworks. It has gained popularity, corporate adoption, and there are many [examples](https://github.com/rstudio/plumber/tree/master/inst/plumber) and tutorials out there to get you get started.
45+
46+
### Make a new function
47+
48+
Let's define a few variables then use `faas-cli new` to create a new function called `covid-forecast` based on the `rstats-base-plumber` template:
49+
50+
```bash
51+
export OPENFAAS_PREFIX="" # Populate with your Docker Hub username
52+
export OPENFAAS_URL="http://174.138.114.98:8080" # Populate with your OpenFaaS URL
53+
54+
faas-cli new --lang rstats-base-plumber covid-forecast --prefix=$OPENFAAS_PREFIX
55+
```
56+
57+
Your folder now should contain the following files:
58+
59+
```bash
60+
covid-forecast/handler.R
61+
covid-forecast/DESCRIPTION
62+
covid-forecast.yml
63+
```
64+
65+
The `covid-forecast.yml` is the stack file used to configure functions (read more [here](https://docs.openfaas.com/reference/yaml/)). You can now edit the files in the `covid-forecast` folder.
66+
67+
### Time series forecast
68+
69+
I will use [exponential smoothing](https://en.wikipedia.org/wiki/Exponential_smoothing) as a time series forecasting method. The method needs a _time series_ data, that is a series of numeric values collected at some interval. I use here daily updated COVID-19 case counts. The [data source](https://github.com/CSSEGISandData/COVID-19) is the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The flat files provided by the CSSE are further processed to provide a JSON API (read more about the [API](https://blog.analythium.io/data-integration-and-automated-updates-for-web-applications/) and its [endpoints](https://github.com/analythium/covid-19#readme), or explore the data interactively [here](https://hub.analythium.io/covidapp/)).
70+
71+
### Customize the function
72+
73+
The `covid-forecast/handler.R` contains the actual R code implementing the function logic. You'll see an example for that below. The dependencies required by the handler need to be added to the `covid-forecast/DESCRIPTION` file. Read more about how the dependencies specified in the `DESCRIPTION` file are installed [here](https://github.com/analythium/openfaas-rstats-templates#customize-your-function).
74+
75+
> See [worked examples](https://github.com/analythium/openfaas-rstats-examples) for different use cases. Read more about the [structure of the templates](template/README.md) if advanced tuning is required, e.g. by editing the `Dockerfile`, etc.
76+
77+
Add the forecast R package to the `covid-forecast/DESCRIPTION` file:
78+
79+
```yaml
80+
Package: COVID
81+
Version: 0.0.1
82+
Imports:
83+
forecast
84+
Remotes:
85+
SystemRequirements:
86+
VersionedPackages:
87+
```
88+
89+
Change the `covid-forecast/handler.R` file:
90+
91+
```R
92+
library(forecast)
93+
94+
covid_forecast <- function(region, cases, window, last) {
95+
## API endpoint for region in global data set
96+
u <- paste0("https://hub.analythium.io/covid-19/api/v1/regions/", region)
97+
x <- jsonlite::fromJSON(u) # will throw error if region is not found
98+
## check arguments
99+
if (missing(cases))
100+
cases <- "confirmed"
101+
cases <- match.arg(cases, c("confirmed", "deaths"))
102+
if (missing(window))
103+
window <- 14
104+
window <- round(window)
105+
if (window < 1)
106+
stop("window must be > 0")
107+
## time series: daily new cases
108+
y <- pmax(0, diff(x$rawdata[[cases]]))
109+
## dates
110+
z <- as.Date(x$rawdata$date)
111+
## trim time series according to last date
112+
if (!missing(last)) {
113+
last <- min(max(z), as.Date(last))
114+
y <- y[z <= last]
115+
z <- z[z <= last]
116+
} else {
117+
last <- z[length(z)]
118+
}
119+
## fit exponential smoothing model
120+
m <- ets(y)
121+
## forecast based on model and window
122+
f <- forecast(m, h=window)
123+
## processing the forecast object
124+
p <- cbind(Date=seq(last+1, last+window, 1), as.data.frame(f))
125+
p[p < 0] <- 0
126+
as.list(p)
127+
}
128+
129+
#* COVID
130+
#* @get /
131+
function(region, cases, window, last) {
132+
if (!missing(window))
133+
window <- as.numeric(window)
134+
covid_forecast(region, cases, window, last)
135+
}
136+
```
137+
138+
The R script loads the forecast package, defines the `covid_forecast` function with three arguments:
139+
140+
- `region`: a region slug value for the API endpoint in global data set (see [available values](https://hub.analythium.io/covid-19/api/v1/regions/)),
141+
- `cases`: one of `"confirmed"` or `"deaths"`,
142+
- `windows`: a positive integer giving the forecast horizon in days.
143+
144+
The function gives the following output in R:
145+
146+
```R
147+
covid_forecast("canada-combined", cases="confirmed", window=4)
148+
# $Date
149+
# [1] "2021-02-19" "2021-02-20" "2021-02-21" "2021-02-22"
150+
# $`Point Forecast`
151+
# [1] 2861.592 2871.802 2879.980 2886.529
152+
# $`Lo 80`
153+
# [1] 1694.809 1695.439 1686.198 1667.680
154+
# $`Hi 80`
155+
# [1] 4028.375 4048.165 4073.761 4105.377
156+
# $`Lo 95`
157+
# [1] 1077.152 1072.711 1054.249 1022.461
158+
# $`Hi 95`
159+
# [1] 4646.033 4670.894 4705.710 4750.596
160+
```
161+
162+
The result of the call is a list with six elements, all elements are vectors of length 4 which is our time window. The `Date` element gives the days of the forecast, the `Point Forecast` is the expected value of the prediction, whereas the lower (`Lo`) and upper (`Hi`) prediction intervals represent the uncertainty around the point forecast. The 80% interval (within the `Lo 80` and `Hi 80` bound) and the 95% interval means that the 80% or 95% of the future observations will fall inside that range, respectively. The following plot combines the historical daily case counts and the 14-day forecast for Canada. The point forecast is the blue line, the 80% and 95% forecast intervals are the shaded areas:
163+
164+
![COVID-19 Canada](covid-canada-2021-02-19.png)
165+
166+
The last part of the script defines the Plumber endpoint `/` for a GET request. One of the nicest features of Plumber is that it allows you to create a web API by [decorating the R source code](https://www.rplumber.io/articles/quickstart.html) with special `#*` comments. These annotations will tell Plumber how to handle the requests, what kind of parsers and formatters to use, etc. The current setup will treat the function arguments as URL parameters. The default content type for the response is JSON, thus we do not need to specify it.
167+
168+
```R
169+
#* COVID
170+
#* @get /
171+
function(region, cases, window) {
172+
if (missing(cases))
173+
cases <- "confirmed"
174+
if (missing(window))
175+
window <- 14
176+
covid_forecast(region, cases, as.numeric(window))
177+
}
178+
```
179+
180+
Adding default values as part of the handle function arguments makes some of the URL parameters optional. In this case, we need to treat missing parameters as `missing()`. We also need to remember that URL form encoded parameters will be of character type, thus checking type and making appropriate type conversions is necessary (i.e. `as.numeric()` for the `window` argument passed to `covid_forecast`).
181+
182+
### Build, push, and deploy the function
183+
184+
Now you can use `faas-cli up` to build, push, and deploy the COVID-19 forecast function to the OpenFaaS cluster:
185+
186+
```bash
187+
faas-cli up -f covid-forecast.yml
188+
```
189+
190+
You can test the function's deployed instance with curl:
191+
192+
```bash
193+
curl -X GET -G \
194+
$OPENFAAS_URL/function/covid-forecast \
195+
-d region=canada-combined \
196+
-d cases=confirmed \
197+
-d window=4
198+
```
199+
200+
Or simply by visiting the URL `$OPENFAAS_URL/function/covid-forecast?region=canada-combined&window=4`. The output should be something like this (depending on the day you make the request):
201+
202+
```bash
203+
{
204+
"Date":["2021-02-19","2021-02-20","2021-02-21","2021-02-22"],
205+
"Point Forecast":[2861.5922,2871.8024,2879.9795,2886.5285],
206+
"Lo 80":[1694.8092,1695.4395,1686.1983,1667.6804],
207+
"Hi 80":[4028.3753,4048.1652,4073.7608,4105.3767],
208+
"Lo 95":[1077.1515,1072.7106,1054.2487,1022.4611],
209+
"Hi 95":[4646.0329,4670.8941,4705.7104,4750.596]
210+
}
211+
```
212+
213+
Only the `region` parameter is mandatory, the the other two defaults to
214+
`cases="confirmed"` and `window=14`.
215+
`OPENFAAS_URL/function/covid-forecast?region=us` will be the same as
216+
`OPENFAAS_URL/function/covid-forecast?region=us&window=14`.
217+
218+
The time series itself that was the basis for the forecast, along with the forecast and the associated uncertainty (prediction intervals) for the US would look like the this:
219+
220+
![COVID-19 US](covid-us-2021-02-19.png)
221+
222+
### Wrapping up
223+
224+
In this post I showed how to use the R templates for OpenFaaS. We built a serverless function that consumes data from an external APIs, fits exponential smoothing model, and makes a forecast. The data API with the forecasting function can be added to web applications to provide timely updates on the fly.
225+
226+
The function presented here could be extended to a microservice that might also provide a summary of past case counts in a [dynamic document](https://rmarkdown.rstudio.com/) building on R's powerful authoring tools.
227+
228+
- [Learn about alternative ways of passing parameters to the COVID-19 function](https://github.com/analythium/openfaas-rstats-examples/tree/main/02-time-series-forecast)
229+
- [See the list of available R templates for OpenFaaS](https://github.com/analythium/openfaas-rstats-templates#readme)
230+
- [Check out other R examples with OpenFaaS](https://github.com/analythium/openfaas-rstats-examples)

_staff_members/peter.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
name: Peter Solymos
3+
position: Guest
4+
image_path: /images/author/peter.png
5+
twitter_username: psolymos
6+
github_username: psolymos
7+
linkedin_username: peter-solymos
8+
webpage: https://peter.solymos.org/
9+
blurb: Biologist. Data scientist. Co-founder of <a href="https://analythium.io">Analythium</a>
10+
---

images/2021-02-r/background.jpg

201 KB
Loading
275 KB
Loading

images/author/peter.png

629 KB
Loading

0 commit comments

Comments
 (0)