ParallelCluster and AWS Batch

0

I'm new to using ParallelCluster. Have it set up in an AWS VPC and running test jobs successfully using the traditional pcluster scheduler. Now I'm setting it up with the AWS Batch back-end. I added this to ~/.parallelcluster/config:

[cluster awsbatch]
base_os = alinux
scheduler = awsbatch
vpc_settings = public
key_name = swt_kk...
compute_instance_type = c5.xlarge

[vpc public]
master_subnet_id = subnet-049f...
compute_subnet_id = subnet-049f...
vpc_id = vpc-044b...

Following instructions here:
https://aws-parallelcluster.readthedocs.io/en/latest/tutorials/03_batch_mpi.html

source ~/envs/pcluster-virtualenv/bin/activate
pcluster create awsbatch --cluster-template awsbatch

the cluster creates ok and I can see the master and compute nodes running in the EC2 console, but pcluster cannot see the compute node:

(pcluster-virtualenv) [kk@ip-172-16-0-10 ~]$ pcluster instances awsbatch
MasterServer         i-065da40163ecebe4a

(pcluster-virtualenv) [kk@ip-172-16-0-10 ~]$ awsbhosts --cluster awsbatch
ec2InstanceId    instanceType    privateIpAddress    publicIpAddress    runningJobs
---------------  --------------  ------------------  -----------------  -------------

and if I start a test job it just sits in the queue.

The master and compute subnets are the same and have an internet gateway attached. I see this comment in the tutorial page:

# Replace with id of the subnet for the Compute nodes.
# A NAT Gateway is required for MNP.

Is a NAT Gateway still required if an Internet Gateway is already in place? Must the compute subnet be different from the master subnet?

Any ideas on what might be going wrong? Ways to debug?

Thanks,
Kim

kimyx
posta 5 anni fa328 visualizzazioni
3 Risposte
0

An internet gateway is not sufficient, a NAT gateway is needed because the containers only get private ip addresses. Underneath, AWS Batch is using task networking, which you can read about here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html

The best way to setup the networking is to follow the guide: https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html

Then take the public and private subnets that were created and specify them in your config:

[vpc public-private]
master_subnet_id = subnet-<public>
compute_subnet_id = subnet-<private>

FYI, single node jobs (jobs not submitted with the -N flag) don't need a NAT gateway and should run on your existing setup.

Edited by: aws-hpc-sean on Feb 20, 2019 3:33 PM

con risposta 5 anni fa
0

Great information, thanks. I'll work through it and report back.

I swear that the initial "vpc public" setup wouldn't even run a simple one-node job, but maybe something else was wrong. Will definitely check it out.

kimyx
con risposta 5 anni fa
0

Using the create-public-private-vpc tutorial got me up and running. Our previous VPC had been cobbled together over time -- it seemed to have all the right pieces but there were some differences in how the routing tables attached to the VPC. I could ssh to the master and compute EC2s but awsb tools couldn't see them.

Thanks for your help,
Kim

kimyx
con risposta 5 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande