discrepancy between pcluster partition conf and EC2 instance specification

0

Submitting a slurm job with parameters

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=6
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu-t4

fails with

sbatch: error: CPU count per node can not be satisfied

The partition comprises g4dn.2xlarge instances with 8 "vCPUs", according to https://aws.amazon.com/ec2/instance-types/g4. Surprisingly to me, /opt/slurm/etc/pcluster/slurm_parallelcluster_gpu-t4_partition.conf includes

NodeName=gpu-t4-dy-g4dn-2xlarge-[1-2] CPUs=4 RealMemory=31129 State=CLOUD Feature=dynamic,g4dn.2xlarge,g4dn-2xlarge,gpu Gres=gpu:t4:1

Should not CPUs= be larger than 4 for this specific instance type?

질문됨 일 년 전298회 조회
2개 답변
0

I searched some more and and found that, according to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/cpu-options-supported-instances-values.html#cpu-options-accelerated, g4dn.2xlarge may not be supporting more than 4 "Valid CPU cores".

답변함 일 년 전
0

If DisableSimultaneousMultithreading is not specified in the cluster configuration file, the CPU is 8: NodeName=queue1-st-g4dn2xlarge-[1-1] CPUs=8 RealMemory=31129 State=CLOUD Feature=static,g4dn.2xlarge,g4dn2xlarge,gpu Gres=gpu:t4:1

If DisableSimultaneousMultithreading is set to true in the cluster configuration file, the CPU is 4: NodeName=queue1-st-g4dn2xlarge-[1-1] CPUs=4 RealMemory=31129 State=CLOUD Feature=static,g4dn.2xlarge,g4dn2xlarge,gpu Gres=gpu:t4:1

See https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html#yaml-Scheduling-SlurmQueues-ComputeResources-DisableSimultaneousMultithreading for more information

답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠