Failed to setup parallel cluster on AWS EC2 with Ubuntu OS.

0

Hi!

I am very new to AWS EC2 and AWS Parallel Cluster. I have two virtual compute nodes running on Ubuntu 18.04 with g3s.xlarge and c5.4xlarge instances. My goals are setting up parallel cluster (master & slave nodes) with SLURM job manager for running the calculation on multi-nodes with parallel method like MPI.

So far, I have tried to create a new parallel cluster using pcluster tool by learning from a quick manual on README.md in aws-parallelcluster github repository and full AWS ParallelCluster manual, but I failed to do that. I have also tweaked the config file which is stored at $HOME/.parallelcluster folder, and even added --norollback option, but the errors still persist.

My modified config file:

[aws]
aws_region_name = us-east-1

[cluster hpctest]
key_name = XXXXXXXX
base_os = alinux
master_instance_type = g3s.xlarge
master_root_volume_size = 64
compute_instance_type = c5.4xlarge
compute_root_volume_size = 64
initial_queue_size = 0
max_queue_size = 8
maintain_initial_size = false
custom_ami = ami-0fd18b144da8357b7
scheduler = slurm
cluster_type = spot
placement_group = DYNAMIC
placement = cluster
ebs_settings = shared
fsx_settings = fs
vpc_settings = public

[ebs shared]
shared_dir = shared
volume_type = st1
volume_size = 500

[fsx fs]
shared_dir = /fsx
storage_capacity = 3600

[global]
cluster_template = hpctest
update_check = true
sanity_check = true

[vpc public]
vpc_id = vpc-XXXXXXXX
master_subnet_id = subnet-XXXXXXXX

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[scailing custom]
scaledown_idletime = 1

Note: I take custom_ami id from https://github.com/aws/aws-parallelcluster/blob/master/amis.txt.

The errors I am facing with:

ubuntu@ip-172-XX-XX-XXX:~$ pcluster create t1
Beginning cluster creation for cluster: t1
Creating stack named: parallelcluster-t1
Status: parallelcluster-t1 - ROLLBACK_IN_PROGRESS
Cluster creation failed.  Failed events:
  - AWS::EC2::SecurityGroup MasterSecurityGroup Resource creation cancelled
  - AWS::EC2::PlacementGroup DynamicPlacementGroup Resource creation cancelled
  - AWS::EC2::EIP MasterEIP Resource creation cancelled
  - AWS::CloudFormation::Stack EBSCfnStack Resource creation cancelled
  - AWS::DynamoDB::Table DynamoDBTable Resource creation cancelled
  - AWS::IAM::Role RootRole API: iam:CreateRole User: arn:aws:iam::3043XXXXXXXX:user/nutt is not authorized to perform: iam:CreateRole on resource: arn:aws:iam::3043XXXXXXXX:role/parallelcluster-t1-RootRole-1L9A3XXXXXXXX

Can anyone help me to solve this problem? If there is any previous threads asking as same as my questions or facing the same problems, please let me know so that I could start to learning with that.

Thank you for your time!

Rangsiman

질문됨 5년 전377회 조회
2개 답변
0

Hi,

The cluster creation is failing since you don't have the correct IAM permissions:

The user in question is:
arn:aws:iam::3043XXXXXXXX:user/nutt

Steps to fix this are:

  1. Go to the AWS IAM Console
  2. Select Users and click "nutt"
  3. Add the policy shown here: https://docs.aws.amazon.com/parallelcluster/latest/ug/iam.html#ParallelClusterUserPolicy

You may need to have your admin do this if you don't have permissions to do yourself.

AWS
답변함 5년 전
0

Hi,

Got it! I can create the parallel cluster smoothly. Thank you very much. :)

Rangsiman

Edited by: rangsiman on Oct 31, 2019 7:38 AM

Edited by: rangsiman on Oct 31, 2019 7:39 AM

답변함 5년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠