ParallelCluster, AWS Batch, post_install

0

I have pcluster running somewhat successfully with the awsbatch scheduler on alinux. Now I want to mount an EFS volume on both the Master and Compute nodes. I have a lot of custom native libraries that I need to reference from running code, and these will be updated from time to time. Output data will also be written to EFS.

I've used the post_install configuration to make it work on the Master node. Unfortunately I just noticed that the post_install configuration is limited for awsbatch:

.....When using awsbatch as the scheduler, the postinstall script is only executed on the master node.

What's the recommended way to mount an EFS volume on awsbatch compute nodes when they are created?

I do see this about Batch AMIs:

https://docs.aws.amazon.com/batch/latest/userguide/create-batch-ami.html

I've tried using custom AMIs with pcluster and the awsbatch scheduler, but haven't had any luck so far. Should these batch instructions work with pcluster? How do I enable this kind of AMI?

I see other file sharing options in the pcluster configuration: S3 access, shared EBS volumes, the "shared" folder. Are those expected to work on awsbatch compute nodes? Am I better off just going back to the sge scheduler?

Thanks,
Kim

kimyx
asked 5 years ago243 views
3 Answers
0
Accepted Answer

In case of batch shared drives are mounted inside the container and not on the compute node where the container is running. You can access the shared directories when submitting your jobs.

Please take a look at this example for further details: https://aws-parallelcluster.readthedocs.io/en/latest/tutorials/03_batch_mpi.html#running-your-first-job-using-aws-batch

Also when choosing awsbatch as scheduler you can use shared_dir, EBS, RAID and EFS. Just be aware that share drives (all except from EFS) are shared from the master node so you need to take this into account when choosing the instance type for your master node.

AWS
answered 5 years ago
0

I just came across this recent ticket: https://github.com/aws/aws-parallelcluster/issues/829
which is similar to my situation. The following comment from @lukeseawalker sounds good to me:

.....when the scheduler is awsbatch, the custom_ami parameter only applies to the master node.
AWS Batch is docker based and at this time we don’t allow using custom ami’s.
The only way to have pre-installed software into the AWS Batch cluster, is through installing it into the shared filesystem which will be then mounted into the docker.....

but I'm not seeing the /shared directory on a compute node. Is there something more than setting the shared_dir config property to making this work with awsbatch? Do I need to use the ebs section?

Edited by: kimyx on Mar 13, 2019 8:17 PM

Edited by: kimyx on Mar 13, 2019 8:19 PM

kimyx
answered 5 years ago
0

That's great! I'd fought with setup for long enough that I forgot about that "shared secret" example. I can now successfully run the awsbatch mpi tutorial. I'll try the awsbatch efs configuration next.

kimyx
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions