|
| 1 | +# Building Docker Containers for doAzureParallel |
| 2 | + |
| 3 | +As of version v0.6.0 doAzureParallel runs all workloads within a Docker container. This has several benefits including consistent immutable runtime, custom R version, environment and packages and improved testing before deploying to doAzureParallel |
| 4 | + |
| 5 | +The documentation below builds on top of the standard Docker documentation. It is highly recommended you read up on Docker [documentation](https://docs.docker.com/), specifically their [getting started guide](https://docs.docker.com/get-started/). |
| 6 | + |
| 7 | +Prerequisites |
| 8 | +- Install Docker [instructions](https://docs.docker.com/engine/installation/) |
| 9 | + |
| 10 | +## Use cases |
| 11 | +These are some of the common use cases for builing your own images in Docker. |
| 12 | + |
| 13 | +### Custom version of R |
| 14 | +If you have your own R runtime, or want to use something other than the default version of R that doAzureParallel uses, you can easily point to an existing Docker image or build one yourself. This allows for the flexibility to use any R version you need without being subjected to what defaults are used by this toolkit. |
| 15 | + |
| 16 | +### Custom packages pre-built into your environment |
| 17 | +Installing packages is often complex and involved and takes a few tries to get right. Using docker you can make sure that your images are built correctly on your local machine without needing to try building and rebuilding doAzureParallel clusters trying to get it right. This also means that you can pull in your own custom packages and guarantee that the version of the package inside the container will never change and your runs will always produce the same results. |
| 18 | + |
| 19 | + |
| 20 | +### Improved cluster provisioning reliability and start up time |
| 21 | +One issue with installing packages is that they can take time to load and install, and are subject to potential issues with repository access and network reliability. By pre-packaging everything into your container, you can guarantee that everything is already built and available and will be loaded correctly in the doAzureParallel cluster. |
| 22 | + |
| 23 | +## Building your own container image |
| 24 | +Building container images may seem a bit difficult to begin with, but they are really no harder than running commands in your command line. The following sections will go through how to build a container image that will install a few R packages and their operating system dependencies. |
| 25 | + |
| 26 | +In the following example we will create an image that installs the popular web based packages jsonlite and httr. This example simply uses an image provided by the RStudio team 'r-ver' and installs a few packages into it. The benefit of using the r-ver package is that it has already done all the hard work of getting R installed, so all we need to do in add the packages we want to use and we should be good to go. |
| 27 | + |
| 28 | +NOTE: Rocker has [several great R container images](https://github.com/rocker-org/rocker/wiki) available on Docker Hub. Take a quick look through them to see if any of them suit your needs. |
| 29 | + |
| 30 | +Create a Dockerfile in a direcotry called 'demo'. Notice the Dockerfile has no extension. |
| 31 | + |
| 32 | +```sh |
| 33 | +mkdir demo |
| 34 | +touch demo/Dockerfile |
| 35 | +``` |
| 36 | + |
| 37 | +Open up the Dockerfile with your favorite editor and paste in the following code. |
| 38 | + |
| 39 | +```Dockerfile |
| 40 | +# Use rocker/r-ver as the base image |
| 41 | +# This will inherit everyhing that was installed into base image already |
| 42 | +# Documented at https://hub.docker.com/r/rocker/r-ver/~/dockerfile/ |
| 43 | +FROM rocker/r-ver |
| 44 | + |
| 45 | +# Install any dependencies required for the R packages |
| 46 | +RUN apt-get update \ |
| 47 | + && apt-get install -y --no-install-recommends \ |
| 48 | + libxml2-dev \ |
| 49 | + libcurl4-openssl-dev \ |
| 50 | + libssl-dev |
| 51 | + |
| 52 | +# Install the R Packages from CRAN |
| 53 | +RUN Rscript -e 'install.packages(c("jsonlite", "httr"))' |
| 54 | +``` |
| 55 | + |
| 56 | +Finally save the file and build the docker image. |
| 57 | + |
| 58 | +```sh |
| 59 | +# docker build takes the directory which contains the Dockerfile as the input |
| 60 | +# -t is used to tag or name the image |
| 61 | +docker build demo -t demo/custom-r-ver |
| 62 | +``` |
| 63 | + |
| 64 | +Once the docker image is built locally, you can list it by running the below command. |
| 65 | +```sh |
| 66 | +docker images |
| 67 | +``` |
| 68 | + |
| 69 | +And you should see the following |
| 70 | + |
| 71 | +```sh |
| 72 | +REPOSITORY TAG IMAGE ID CREATED SIZE |
| 73 | +demo/custom-r-ver latest 55aefec47200 14 seconds ago 709MB |
| 74 | +rocker/r-ver latest 503e3df4e322 21 hours ago 578MB |
| 75 | +``` |
| 76 | + |
| 77 | +rocker/r-ver is the image that was downloaded to build the demo/custom-r-ver. |
| 78 | + |
| 79 | +## Testing your image |
| 80 | + |
| 81 | +Once you have your images built, you can run it locally to test it out. |
| 82 | + |
| 83 | +```sh |
| 84 | +docker run --rm -it demo/custom-r-ver R |
| 85 | +``` |
| 86 | + |
| 87 | +This will open up a conole version of R. To make sure the packages are insalled correctly, load them into the R session. |
| 88 | + |
| 89 | +```sh |
| 90 | +> library(httr) |
| 91 | +> library(jsonlite) |
| 92 | +> sessionInfo() |
| 93 | +``` |
| 94 | + |
| 95 | +The output will show that these packages are now available to use |
| 96 | + |
| 97 | +```sh |
| 98 | +R version 3.4.2 (2017-09-28) |
| 99 | +Platform: x86_64-pc-linux-gnu (64-bit) |
| 100 | +Running under: Debian GNU/Linux 9 (stretch) |
| 101 | + |
| 102 | +Matrix products: default |
| 103 | +BLAS: /usr/lib/openblas-base/libblas.so.3 |
| 104 | +LAPACK: /usr/lib/libopenblasp-r0.2.19.so |
| 105 | + |
| 106 | +locale: |
| 107 | + [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C |
| 108 | + [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 |
| 109 | + [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C |
| 110 | + [7] LC_PAPER=en_US.UTF-8 LC_NAME=C |
| 111 | + [9] LC_ADDRESS=C LC_TELEPHONE=C |
| 112 | +[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C |
| 113 | + |
| 114 | +attached base packages: |
| 115 | +[1] stats graphics grDevices utils datasets methods base |
| 116 | + |
| 117 | +other attached packages: |
| 118 | +[1] jsonlite_1.5 httr_1.3.1 |
| 119 | + |
| 120 | +loaded via a namespace (and not attached): |
| 121 | +[1] compiler_3.4.2 R6_2.2.2 |
| 122 | +``` |
| 123 | + |
| 124 | +## Testing your image for doAzureParallel (advanced) |
| 125 | + |
| 126 | +doAzureParallel will run your container and load in specific direcotories and environement varialbles. |
| 127 | + |
| 128 | +We run the container as follows: |
| 129 | +```sh |
| 130 | +docker run --rm \ |
| 131 | + -v $AZ_BATCH_NODE_ROOT_DIR:$AZ_BATCH_NODE_ROOT_DIR \ |
| 132 | + -e AZ_BATCH_NODE_ROOT_DIR=$AZ_BATCH_NODE_ROOT_DIR \ |
| 133 | + -e AZ_BATCH_NODE_STARTUP_DIR=$AZ_BATCH_NODE_STARTUP_DIR \ |
| 134 | + -e AZ_BATCH_TASK_ID=$AZ_BATCH_TASK_ID \ |
| 135 | + -e AZ_BATCH_JOB_ID=$AZ_BATCH_JOB_ID \ |
| 136 | + -e AZ_BATCH_TASK_WORKING_DIR=$AZ_BATCH_TASK_WORKING_DIR \ |
| 137 | + -e AZ_BATCH_JOB_PREP_WORKING_DIR=$AZ_BATCH_JOB_PREP_WORKING_DIR |
| 138 | +``` |
| 139 | + |
| 140 | +All files downloaded with resource files will be available at $AZ\_BATCH\_NODE\_STARTUP\_DIR/wd. |
| 141 | + |
| 142 | +You can use these values to set up your local environment to look like it is running on a Batch node. |
| 143 | + |
| 144 | +## Deploying your images to Docker Hub |
| 145 | + |
| 146 | +Once you are happy with your image, you can publish it to docker hub |
| 147 | + |
| 148 | +```sh |
| 149 | +docker login |
| 150 | +... |
| 151 | +docker push <username>/custom-r-ver |
| 152 | +``` |
| 153 | + |
| 154 | +## Referencing your image in your cluster.json file |
| 155 | + |
| 156 | +```json |
| 157 | +{ |
| 158 | + "name": "demo", |
| 159 | + "vmSize": "Standard_F2", |
| 160 | + "maxTasksPerNode": 2, |
| 161 | + "poolSize": { |
| 162 | + "dedicatedNodes": { |
| 163 | + "min": 0, |
| 164 | + "max": 0 |
| 165 | + }, |
| 166 | + "lowPriorityNodes": { |
| 167 | + "min": 2, |
| 168 | + "max": 2 |
| 169 | + }, |
| 170 | + "autoscaleFormula": "QUEUE" |
| 171 | + }, |
| 172 | + "containerImage": "<username>/custom-r-ver", |
| 173 | + "rPackages": { |
| 174 | + "cran": [], |
| 175 | + "github": [], |
| 176 | + "bioconductor": [] |
| 177 | + }, |
| 178 | + "commandLine": [] |
| 179 | +} |
| 180 | +``` |
| 181 | + |
| 182 | +## Using private Docker Hub repositories |
| 183 | + |
| 184 | +This is currently not supported in doAzureParallel. |
0 commit comments