Base Docker image / Dockerfile for running R jobs in AWS Batch

0

Is there a recommended Docker image / Dockerfile starting point for running R scripts in AWS Batch? I have looked at a few here, and the starting image is could be any of the following: rocker, python:3.6, ubuntu:16.04 or r-base:

What are the key points to consider in choosing a good starting point for a Dockerfile for R jobs? For this particular workload, my main R packages are data.table and the tidyverse packages (no need for RStudio / Shiny server).

1 Answer
0

First you have to decide do you want Microsoft R or Open R, Microsoft R I believe has some optimizations but is a pain to get to run in docker (they hard code the UIDs resulting in conflicts depending on the host node OS). I did Microsoft R for a client a few years back, it was a bit painful. Unfortunately I can't share that file/work.

If Open R works for your use case then why not just start from r-base? https://hub.docker.com/_/r-base/

I'd recommend a Dockerfile that is specific for your use case, installing data.table / other packages post batch startup isn't ideal, so FROM r-base generally we copied a install.R script to the docker image and run it in r-script to install everything. Then saved that image to ECR and referenced it in the batch jobs.

If you are running a lot of jobs you may want to also look into the ECS cluster to change the caching on the computer environment since the images will be quite large.

profile picture
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions