Failed to setup parallel cluster on AWS EC2 with Ubuntu OS.

0

Hi!

I am very new to AWS EC2 and AWS Parallel Cluster. I have two virtual compute nodes running on Ubuntu 18.04 with g3s.xlarge and c5.4xlarge instances. My goals are setting up parallel cluster (master & slave nodes) with SLURM job manager for running the calculation on multi-nodes with parallel method like MPI.

So far, I have tried to create a new parallel cluster using pcluster tool by learning from a quick manual on README.md in aws-parallelcluster github repository and full AWS ParallelCluster manual, but I failed to do that. I have also tweaked the config file which is stored at $HOME/.parallelcluster folder, and even added --norollback option, but the errors still persist.

My modified config file:

[aws]
aws_region_name = us-east-1

[cluster hpctest]
key_name = XXXXXXXX
base_os = alinux
master_instance_type = g3s.xlarge
master_root_volume_size = 64
compute_instance_type = c5.4xlarge
compute_root_volume_size = 64
initial_queue_size = 0
max_queue_size = 8
maintain_initial_size = false
custom_ami = ami-0fd18b144da8357b7
scheduler = slurm
cluster_type = spot
placement_group = DYNAMIC
placement = cluster
ebs_settings = shared
fsx_settings = fs
vpc_settings = public

[ebs shared]
shared_dir = shared
volume_type = st1
volume_size = 500

[fsx fs]
shared_dir = /fsx
storage_capacity = 3600

[global]
cluster_template = hpctest
update_check = true
sanity_check = true

[vpc public]
vpc_id = vpc-XXXXXXXX
master_subnet_id = subnet-XXXXXXXX

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[scailing custom]
scaledown_idletime = 1

Note: I take custom_ami id from https://github.com/aws/aws-parallelcluster/blob/master/amis.txt.

The errors I am facing with:

ubuntu@ip-172-XX-XX-XXX:~$ pcluster create t1
Beginning cluster creation for cluster: t1
Creating stack named: parallelcluster-t1
Status: parallelcluster-t1 - ROLLBACK_IN_PROGRESS
Cluster creation failed.  Failed events:
  - AWS::EC2::SecurityGroup MasterSecurityGroup Resource creation cancelled
  - AWS::EC2::PlacementGroup DynamicPlacementGroup Resource creation cancelled
  - AWS::EC2::EIP MasterEIP Resource creation cancelled
  - AWS::CloudFormation::Stack EBSCfnStack Resource creation cancelled
  - AWS::DynamoDB::Table DynamoDBTable Resource creation cancelled
  - AWS::IAM::Role RootRole API: iam:CreateRole User: arn:aws:iam::3043XXXXXXXX:user/nutt is not authorized to perform: iam:CreateRole on resource: arn:aws:iam::3043XXXXXXXX:role/parallelcluster-t1-RootRole-1L9A3XXXXXXXX

Can anyone help me to solve this problem? If there is any previous threads asking as same as my questions or facing the same problems, please let me know so that I could start to learning with that.

Thank you for your time!

Rangsiman

asked 4 years ago361 views
2 Answers
0

Hi,

The cluster creation is failing since you don't have the correct IAM permissions:

The user in question is:
arn:aws:iam::3043XXXXXXXX:user/nutt

Steps to fix this are:

  1. Go to the AWS IAM Console
  2. Select Users and click "nutt"
  3. Add the policy shown here: https://docs.aws.amazon.com/parallelcluster/latest/ug/iam.html#ParallelClusterUserPolicy

You may need to have your admin do this if you don't have permissions to do yourself.

AWS
answered 4 years ago
0

Hi,

Got it! I can create the parallel cluster smoothly. Thank you very much. :)

Rangsiman

Edited by: rangsiman on Oct 31, 2019 7:38 AM

Edited by: rangsiman on Oct 31, 2019 7:39 AM

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions