ParallelCluster and AWS Batch

0

I'm new to using ParallelCluster. Have it set up in an AWS VPC and running test jobs successfully using the traditional pcluster scheduler. Now I'm setting it up with the AWS Batch back-end. I added this to ~/.parallelcluster/config:

[cluster awsbatch]
base_os = alinux
scheduler = awsbatch
vpc_settings = public
key_name = swt_kk...
compute_instance_type = c5.xlarge

[vpc public]
master_subnet_id = subnet-049f...
compute_subnet_id = subnet-049f...
vpc_id = vpc-044b...

Following instructions here:
https://aws-parallelcluster.readthedocs.io/en/latest/tutorials/03_batch_mpi.html

source ~/envs/pcluster-virtualenv/bin/activate
pcluster create awsbatch --cluster-template awsbatch

the cluster creates ok and I can see the master and compute nodes running in the EC2 console, but pcluster cannot see the compute node:

(pcluster-virtualenv) [kk@ip-172-16-0-10 ~]$ pcluster instances awsbatch
MasterServer         i-065da40163ecebe4a

(pcluster-virtualenv) [kk@ip-172-16-0-10 ~]$ awsbhosts --cluster awsbatch
ec2InstanceId    instanceType    privateIpAddress    publicIpAddress    runningJobs
---------------  --------------  ------------------  -----------------  -------------

and if I start a test job it just sits in the queue.

The master and compute subnets are the same and have an internet gateway attached. I see this comment in the tutorial page:

# Replace with id of the subnet for the Compute nodes.
# A NAT Gateway is required for MNP.

Is a NAT Gateway still required if an Internet Gateway is already in place? Must the compute subnet be different from the master subnet?

Any ideas on what might be going wrong? Ways to debug?

Thanks,
Kim

kimyx
asked 5 years ago322 views
3 Answers
0

An internet gateway is not sufficient, a NAT gateway is needed because the containers only get private ip addresses. Underneath, AWS Batch is using task networking, which you can read about here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html

The best way to setup the networking is to follow the guide: https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html

Then take the public and private subnets that were created and specify them in your config:

[vpc public-private]
master_subnet_id = subnet-<public>
compute_subnet_id = subnet-<private>

FYI, single node jobs (jobs not submitted with the -N flag) don't need a NAT gateway and should run on your existing setup.

Edited by: aws-hpc-sean on Feb 20, 2019 3:33 PM

answered 5 years ago
0

Great information, thanks. I'll work through it and report back.

I swear that the initial "vpc public" setup wouldn't even run a simple one-node job, but maybe something else was wrong. Will definitely check it out.

kimyx
answered 5 years ago
0

Using the create-public-private-vpc tutorial got me up and running. Our previous VPC had been cobbled together over time -- it seemed to have all the right pieces but there were some differences in how the routing tables attached to the VPC. I could ssh to the master and compute EC2s but awsb tools couldn't see them.

Thanks for your help,
Kim

kimyx
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions