Base Docker image / Dockerfile for running R jobs in AWS Batch

0

Is there a recommended Docker image / Dockerfile starting point for running R scripts in AWS Batch? I have looked at a few here, and the starting image is could be any of the following: rocker, python:3.6, ubuntu:16.04 or r-base:

What are the key points to consider in choosing a good starting point for a Dockerfile for R jobs? For this particular workload, my main R packages are data.table and the tidyverse packages (no need for RStudio / Shiny server).

1개 답변
0

First you have to decide do you want Microsoft R or Open R, Microsoft R I believe has some optimizations but is a pain to get to run in docker (they hard code the UIDs resulting in conflicts depending on the host node OS). I did Microsoft R for a client a few years back, it was a bit painful. Unfortunately I can't share that file/work.

If Open R works for your use case then why not just start from r-base? https://hub.docker.com/_/r-base/

I'd recommend a Dockerfile that is specific for your use case, installing data.table / other packages post batch startup isn't ideal, so FROM r-base generally we copied a install.R script to the docker image and run it in r-script to install everything. Then saved that image to ECR and referenced it in the batch jobs.

If you are running a lot of jobs you may want to also look into the ECS cluster to change the caching on the computer environment since the images will be quite large.

profile picture
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인