Hi!
I am very new to AWS EC2 and AWS Parallel Cluster. I have two virtual compute nodes running on Ubuntu 18.04 with g3s.xlarge and c5.4xlarge instances. My goals are setting up parallel cluster (master & slave nodes) with SLURM job manager for running the calculation on multi-nodes with parallel method like MPI.
So far, I have tried to create a new parallel cluster using pcluster tool by learning from a quick manual on README.md in aws-parallelcluster github repository and full AWS ParallelCluster manual, but I failed to do that. I have also tweaked the config file which is stored at $HOME/.parallelcluster folder, and even added --norollback option, but the errors still persist.
My modified config file:
[aws]
aws_region_name = us-east-1
[cluster hpctest]
key_name = XXXXXXXX
base_os = alinux
master_instance_type = g3s.xlarge
master_root_volume_size = 64
compute_instance_type = c5.4xlarge
compute_root_volume_size = 64
initial_queue_size = 0
max_queue_size = 8
maintain_initial_size = false
custom_ami = ami-0fd18b144da8357b7
scheduler = slurm
cluster_type = spot
placement_group = DYNAMIC
placement = cluster
ebs_settings = shared
fsx_settings = fs
vpc_settings = public
[ebs shared]
shared_dir = shared
volume_type = st1
volume_size = 500
[fsx fs]
shared_dir = /fsx
storage_capacity = 3600
[global]
cluster_template = hpctest
update_check = true
sanity_check = true
[vpc public]
vpc_id = vpc-XXXXXXXX
master_subnet_id = subnet-XXXXXXXX
[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}
[scailing custom]
scaledown_idletime = 1
Note: I take custom_ami id from https://github.com/aws/aws-parallelcluster/blob/master/amis.txt.
The errors I am facing with:
ubuntu@ip-172-XX-XX-XXX:~$ pcluster create t1
Beginning cluster creation for cluster: t1
Creating stack named: parallelcluster-t1
Status: parallelcluster-t1 - ROLLBACK_IN_PROGRESS
Cluster creation failed. Failed events:
- AWS::EC2::SecurityGroup MasterSecurityGroup Resource creation cancelled
- AWS::EC2::PlacementGroup DynamicPlacementGroup Resource creation cancelled
- AWS::EC2::EIP MasterEIP Resource creation cancelled
- AWS::CloudFormation::Stack EBSCfnStack Resource creation cancelled
- AWS::DynamoDB::Table DynamoDBTable Resource creation cancelled
- AWS::IAM::Role RootRole API: iam:CreateRole User: arn:aws:iam::3043XXXXXXXX:user/nutt is not authorized to perform: iam:CreateRole on resource: arn:aws:iam::3043XXXXXXXX:role/parallelcluster-t1-RootRole-1L9A3XXXXXXXX
Can anyone help me to solve this problem? If there is any previous threads asking as same as my questions or facing the same problems, please let me know so that I could start to learning with that.
Thank you for your time!
Rangsiman