Can't Create Cluster - The placement group for EFA-enabled compute resource

0

Hello Folks,

I'm following along with one of the HPC PCluster Gromacs workshops. I've made the modifications to get the example to work with PClusterv3 (I think). However, when I issue the command to create the cluster, I get

" {
"level": "ERROR",
"type": "EfaPlacementGroupValidator",
"message": "The placement group for EFA-enabled compute resources must be explicit. You may see better performance using a placement group, but if you don't wish to use one please add 'Enabled: false' to the compute resource's configuration section."
},
"

The code that I think is the problem looks like this:

Scheduling:
Scheduler: slurm
SlurmQueues:
- Name: g4dn
ComputeSettings:
LocalStorage:
RootVolume:
Size: '100'
Encrypted: 'false'
VolumeType: 'gp2'
ComputeResources:
- Name: g4dn
InstanceType: g4dn.8xlarge
MaxCount: 3
MinCount: 0
DisableSimultaneousMultithreading: 'false'
CustomActions:
OnNodeStart:
Script: s3://pcluster-2021-10-07-XXXXX/post.install.sh
Networking:
SubnetIds:
- subnet-098eaXXXXX
PlacementGroup:
Enabled: 'true'
Id: 'TestPlacementGroup'

I've tried using an Id of an existing PlacementGroup in the targeted AWS Region, and I've tried not having an Id as the docs say that one will be created if not specified. Both attempts fail.

How do I get the "PlacementGroup" to not throw an error and to create the cluster?

Any help is greatly appreciated.

asked 3 years ago442 views
2 Answers
0

Hello darkdiesel,
the EfaPlacementGroupValidator throws an error when EFA is enabled through the EFA option (https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html#yaml-Scheduling-SlurmQueues-ComputeResources-Efa) but placement group is not explicitly set with the option PlacementGroup (https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html#yaml-Scheduling-SlurmQueues-Networking-PlacementGroup-Enabled).

In your configuration I don't see any EFA setting and unfortunately indentation got lost. Can you please open a ticket on https://github.com/aws/aws-parallelcluster/issues reporting again the configuration used so we can better follow you on this? Thanks

AWS
answered 3 years ago
0

Hello Luca,

I had messed up my config, and was looking at the wrong EFa location. It was user error all along.

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions