Skip to content
This repository was archived by the owner on Oct 12, 2023. It is now read-only.

Commit bc529f0

Browse files
authored
initial docs for docker users (#166)
* initial docs for docker users * Fixes plus PR feedback
1 parent dfd18d6 commit bc529f0

File tree

2 files changed

+197
-9
lines changed

2 files changed

+197
-9
lines changed

docs/30-customize-cluster.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,17 @@ Specifying a docker container is done by updating your cluster.json file. Simply
3535

3636
Note: \_If no 'containerImage' property is set, rocker/tidyverse:latest will be used. This usually points to one of the latest versions of R.\_
3737

38+
### List of tested container images
39+
40+
The following containers were tested and cover the most common cases for end users.
41+
42+
Container Image | R type | Description
43+
--- | --- | ---
44+
[rocker/tidyverse](https://hub.docker.com/r/rocker/r-ver/) | Open source R | Tidyverse is provided by the rocker org and uses a standard version of R developed by the open soruce community. rocker/tidyverse typically keeps up with the latest releases or R quite quickly and has versions back to R 3.1
45+
[nuest/mro](https://hub.docker.com/r/nuest/mro/) | Microsoft R Open | [Microsoft R Open](https://mran.microsoft.com/open/) is an open source SKU of R that provides out of the box support for math packages, version package support with MRAN and improved performance over standard Open Source R.
46+
47+
* We recommend reading the details of each package before using it to make sure you understand any limitaions or requirements of using the container images.
48+
3849
### Building your own container
3950

4051
Building your own container gives you the flexibility to package any specific requirements, packages or data you require for running your workloads. We recommend using a debian based OS such as debian or ubuntu to build your containers and pointing to where R is in the final CMD command. For example:
@@ -53,18 +64,11 @@ FROM ubuntu:16.04
5364
CMD ["R"]
5465
```
5566

56-
There is no requirement to be debian based. For consistency with other pacakges it is recommeneded though. Please note though that the container **must be based off a Linux distribution as Windows is not supported**.
57-
58-
### List of tested container images
67+
For more information and samples on how to build images, deploy them to dockerhub and use them in your cluster please refer to the [Building Containers](./32-building-containers.md) documentation.
5968

60-
The following containers were tested and cover the most common cases for end users.
69+
There is no requirement to be debian based. For consistency with other packages it is recommeneded though. Please note though that the container **must be based off a Linux distribution as Windows is not supported**.
6170

62-
Container Image | R type | Description
63-
--- | --- | ---
64-
[rocker/tidyverse](https://hub.docker.com/r/rocker/r-ver/) | Open source R | Tidyverse is provided by the rocker org and uses a standard version of R developed by the open soruce community. rocker/tidyverse typically keeps up with the latest releases or R quite quickly and has versions back to R 3.1
65-
[nuest/mro](https://hub.docker.com/r/nuest/mro/) | Microsoft R Open | [Microsoft R Open](https://mran.microsoft.com/open/) is an open source SKU of R that provides out of the box support for math packages, version pacakge support with MRAN and improved performance over standard Open Source R.
6671

67-
* We recommend reading the details of each package before using it to make sure you understand any limitaions or requirements of using the container images.
6872

6973
## Running Commands when the Cluster Starts
7074

docs/32-building-containers.md

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
# Building Docker Containers for doAzureParallel
2+
3+
As of version v0.6.0 doAzureParallel runs all workloads within a Docker container. This has several benefits including consistent immutable runtime, custom R version, environment and packages and improved testing before deploying to doAzureParallel
4+
5+
The documentation below builds on top of the standard Docker documentation. It is highly recommended you read up on Docker [documentation](https://docs.docker.com/), specifically their [getting started guide](https://docs.docker.com/get-started/).
6+
7+
Prerequisites
8+
- Install Docker [instructions](https://docs.docker.com/engine/installation/)
9+
10+
## Use cases
11+
These are some of the common use cases for builing your own images in Docker.
12+
13+
### Custom version of R
14+
If you have your own R runtime, or want to use something other than the default version of R that doAzureParallel uses, you can easily point to an existing Docker image or build one yourself. This allows for the flexibility to use any R version you need without being subjected to what defaults are used by this toolkit.
15+
16+
### Custom packages pre-built into your environment
17+
Installing packages is often complex and involved and takes a few tries to get right. Using docker you can make sure that your images are built correctly on your local machine without needing to try building and rebuilding doAzureParallel clusters trying to get it right. This also means that you can pull in your own custom packages and guarantee that the version of the package inside the container will never change and your runs will always produce the same results.
18+
19+
20+
### Improved cluster provisioning reliability and start up time
21+
One issue with installing packages is that they can take time to load and install, and are subject to potential issues with repository access and network reliability. By pre-packaging everything into your container, you can guarantee that everything is already built and available and will be loaded correctly in the doAzureParallel cluster.
22+
23+
## Building your own container image
24+
Building container images may seem a bit difficult to begin with, but they are really no harder than running commands in your command line. The following sections will go through how to build a container image that will install a few R packages and their operating system dependencies.
25+
26+
In the following example we will create an image that installs the popular web based packages jsonlite and httr. This example simply uses an image provided by the RStudio team 'r-ver' and installs a few packages into it. The benefit of using the r-ver package is that it has already done all the hard work of getting R installed, so all we need to do in add the packages we want to use and we should be good to go.
27+
28+
NOTE: Rocker has [several great R container images](https://github.com/rocker-org/rocker/wiki) available on Docker Hub. Take a quick look through them to see if any of them suit your needs.
29+
30+
Create a Dockerfile in a direcotry called 'demo'. Notice the Dockerfile has no extension.
31+
32+
```sh
33+
mkdir demo
34+
touch demo/Dockerfile
35+
```
36+
37+
Open up the Dockerfile with your favorite editor and paste in the following code.
38+
39+
```Dockerfile
40+
# Use rocker/r-ver as the base image
41+
# This will inherit everyhing that was installed into base image already
42+
# Documented at https://hub.docker.com/r/rocker/r-ver/~/dockerfile/
43+
FROM rocker/r-ver
44+
45+
# Install any dependencies required for the R packages
46+
RUN apt-get update \
47+
&& apt-get install -y --no-install-recommends \
48+
libxml2-dev \
49+
libcurl4-openssl-dev \
50+
libssl-dev
51+
52+
# Install the R Packages from CRAN
53+
RUN Rscript -e 'install.packages(c("jsonlite", "httr"))'
54+
```
55+
56+
Finally save the file and build the docker image.
57+
58+
```sh
59+
# docker build takes the directory which contains the Dockerfile as the input
60+
# -t is used to tag or name the image
61+
docker build demo -t demo/custom-r-ver
62+
```
63+
64+
Once the docker image is built locally, you can list it by running the below command.
65+
```sh
66+
docker images
67+
```
68+
69+
And you should see the following
70+
71+
```sh
72+
REPOSITORY TAG IMAGE ID CREATED SIZE
73+
demo/custom-r-ver latest 55aefec47200 14 seconds ago 709MB
74+
rocker/r-ver latest 503e3df4e322 21 hours ago 578MB
75+
```
76+
77+
rocker/r-ver is the image that was downloaded to build the demo/custom-r-ver.
78+
79+
## Testing your image
80+
81+
Once you have your images built, you can run it locally to test it out.
82+
83+
```sh
84+
docker run --rm -it demo/custom-r-ver R
85+
```
86+
87+
This will open up a conole version of R. To make sure the packages are insalled correctly, load them into the R session.
88+
89+
```sh
90+
> library(httr)
91+
> library(jsonlite)
92+
> sessionInfo()
93+
```
94+
95+
The output will show that these packages are now available to use
96+
97+
```sh
98+
R version 3.4.2 (2017-09-28)
99+
Platform: x86_64-pc-linux-gnu (64-bit)
100+
Running under: Debian GNU/Linux 9 (stretch)
101+
102+
Matrix products: default
103+
BLAS: /usr/lib/openblas-base/libblas.so.3
104+
LAPACK: /usr/lib/libopenblasp-r0.2.19.so
105+
106+
locale:
107+
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
108+
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
109+
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
110+
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
111+
[9] LC_ADDRESS=C LC_TELEPHONE=C
112+
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
113+
114+
attached base packages:
115+
[1] stats graphics grDevices utils datasets methods base
116+
117+
other attached packages:
118+
[1] jsonlite_1.5 httr_1.3.1
119+
120+
loaded via a namespace (and not attached):
121+
[1] compiler_3.4.2 R6_2.2.2
122+
```
123+
124+
## Testing your image for doAzureParallel (advanced)
125+
126+
doAzureParallel will run your container and load in specific direcotories and environement varialbles.
127+
128+
We run the container as follows:
129+
```sh
130+
docker run --rm \
131+
-v $AZ_BATCH_NODE_ROOT_DIR:$AZ_BATCH_NODE_ROOT_DIR \
132+
-e AZ_BATCH_NODE_ROOT_DIR=$AZ_BATCH_NODE_ROOT_DIR \
133+
-e AZ_BATCH_NODE_STARTUP_DIR=$AZ_BATCH_NODE_STARTUP_DIR \
134+
-e AZ_BATCH_TASK_ID=$AZ_BATCH_TASK_ID \
135+
-e AZ_BATCH_JOB_ID=$AZ_BATCH_JOB_ID \
136+
-e AZ_BATCH_TASK_WORKING_DIR=$AZ_BATCH_TASK_WORKING_DIR \
137+
-e AZ_BATCH_JOB_PREP_WORKING_DIR=$AZ_BATCH_JOB_PREP_WORKING_DIR
138+
```
139+
140+
All files downloaded with resource files will be available at $AZ\_BATCH\_NODE\_STARTUP\_DIR/wd.
141+
142+
You can use these values to set up your local environment to look like it is running on a Batch node.
143+
144+
## Deploying your images to Docker Hub
145+
146+
Once you are happy with your image, you can publish it to docker hub
147+
148+
```sh
149+
docker login
150+
...
151+
docker push <username>/custom-r-ver
152+
```
153+
154+
## Referencing your image in your cluster.json file
155+
156+
```json
157+
{
158+
"name": "demo",
159+
"vmSize": "Standard_F2",
160+
"maxTasksPerNode": 2,
161+
"poolSize": {
162+
"dedicatedNodes": {
163+
"min": 0,
164+
"max": 0
165+
},
166+
"lowPriorityNodes": {
167+
"min": 2,
168+
"max": 2
169+
},
170+
"autoscaleFormula": "QUEUE"
171+
},
172+
"containerImage": "<username>/custom-r-ver",
173+
"rPackages": {
174+
"cran": [],
175+
"github": [],
176+
"bioconductor": []
177+
},
178+
"commandLine": []
179+
}
180+
```
181+
182+
## Using private Docker Hub repositories
183+
184+
This is currently not supported in doAzureParallel.

0 commit comments

Comments
 (0)