torque nodes overloaded with TSK greater than NP

0

Hello,
I noticed that nodes in my cluster tend to overcommit and are overloaded running more torque jobs than the number of available CPUs. I suspect it may be related to the torque configuration (or maybe it doesn't respect hyperthreading somehow?)
I am using parallelcluster 2.10 with a custom AMI and maximum 12 nodes with 8 processors on each (c5.4xlarge without hyperthreading).

The node I would be analyzing here is ip-172-31-68-184
This is the qnodes output for this node, should be allowing up to np=8 CPUs
[code]
$ qnodes
...
ip-172-31-68-184
state = free
power_state = Running
np = 8
ntype = cluster
jobs = 0/218.ip-172-31-24-41.eu-central-1.compute.internal,1/219.ip-172-31-24-41.eu-central-1.compute.internal,2/220.ip-172-31-24-41.eu-central-1.compute.internal,3/221.ip-172-31-24-41.eu-central-1.compute.internal,4/518.ip-172-31-24-41.eu-central-1.compute.internal
status = opsys=linux,uname=Linux ip-172-31-68-184 4.18.0-193.28.1.el8_2.x86_64 #1 SMP Thu Oct 22 00:20:22 UTC 2020 x86_64,sessions=1182 1306 5674 6030 6039 6046 6062 112846,nsessions=8,nusers=4,idletime=166759,totmem=31720500kb,availmem=29305472kb,physmem=31720500kb,ncpus=8,loadave=18.33,gres=,netload=47638299866,state=free,varattr= ,cpuclock=Fixed,macaddr=02:5a:f2:25:37:ba,version=6.1.2,rectime=1612984963,jobs=218.ip-172-31-24-41.eu-central-1.compute.internal 219.ip-172-31-24-41.eu-central-1.compute.internal 220.ip-172-31-24-41.eu-central-1.compute.internal 221.ip-172-31-24-41.eu-central-1.compute.internal 518.ip-172-31-24-41.eu-central-1.compute.internal
mom_service_port = 15002
mom_manager_port = 15003
[/code]
, whereas the qstat output for this node:
[code]
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
218.ip-172-31-24-41.eu flacscloud batch 000038 6030 -- 4 -- 48:00:00 R 46:13:51
ip-172-31-68-184/0
219.ip-172-31-24-41.eu flacscloud batch 000039 6039 -- 4 -- 48:00:00 R 46:13:51
ip-172-31-68-184/1
220.ip-172-31-24-41.eu flacscloud batch 000056 6046 -- 4 -- 48:00:00 R 46:13:51
ip-172-31-68-184/2
221.ip-172-31-24-41.eu flacscloud batch 000060 6062 -- 4 -- 48:00:00 R 46:13:51
ip-172-31-68-184/3
518.ip-172-31-24-41.eu flacscloud batch 012310 112846 -- 2 -- 48:00:00 R 23:16:18
ip-172-31-68-184/4
[/code]

it is clear that sum of TSK for running jobs is greater than number of CPUs. This observation can be confirmed while running top on this node, the node is overloaded.
Why would that happen and how can I fix this behavior?

Edited by: mfolusiak on Feb 10, 2021 12:03 PM

Edited by: mfolusiak on Feb 10, 2021 1:09 PM

asked 3 years ago158 views
3 Answers
0
Accepted Answer

Hi @mfolusiak,

Thanks for your information. Based on your submit_args, job submission command was using -l ncpus=2 to specify the number of vcpus, if you replace the resource arg with -l nodes=1:ppn=2, this resource arg will solve overload issue and allocate job to different instances according to the instance vcpus capacity.

nodes - specifies the number of separate nodes that should be allocated
ppn - how many processes to allocate for each node

~Yulei

Edited by: yulei-AWS on Feb 12, 2021 4:02 PM

answered 3 years ago
0

Hi @mfolusiak,

Did you use OpenMPI to execute your job in job script? If so, this is an expected behavior called oversubscribing, you can check the details in this link https://www.open-mpi.org/faq/?category=running#oversubscribing. Meanwhile, did you specify --hostfile option in your job script? You can check the hostfile to see if there are more than 8 slots in settings. If the hostfile was specified more than 8 slots despite only 8 slots being available on a single c5.4xlarge instance with disabled hyperthreading, an oversubscribing issue was probably occurring that could have severely degraded the performances due to MPI processes being executed in aggressive mode with an expected sufficient number of slots.

If it's not the case mentioned above, please provide job script/hostfile/job submission command, thank you.

~Yulei

Edited by: AWS-yuleiwan on Feb 11, 2021 11:19 AM

answered 3 years ago
0

Hi @AWS-yuleiwan,
I am not using MPI, I am using OpenMP though, but with the same number of threads I reserve for the job.
See the detailed qstat report below. The job submission arguments are in submit_args I believe. As you can see, the python script is launched there that further launches executable utilizing the same number of threads as the number of cpus for the job.

[code]
Job Id: 518.ip-172-31-24-41.eu-central-1.compute.internal
Job_Name = 012310
Job_Owner = flacscloud@ip-172-31-24-41.eu-central-1.compute.internal
resources_used.cput = 21:59:50
resources_used.energy_used = 0
resources_used.mem = 422380kb
resources_used.vmem = 3728048kb
resources_used.walltime = 23:12:53
job_state = R
queue = batch
server = ip-172-31-24-41.eu-central-1.compute.internal
Checkpoint = u
ctime = Tue Feb 9 20:03:36 2021
exec_host = ip-172-31-68-184/4
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Tue Feb 9 20:03:36 2021
Output_Path = ip-172-31-24-41.eu-central-1.compute.internal:/shared/flacsc
loud/users/chris/D04F899F-43FC-419B-B8A7-15A8D3176A6F/Auriga/D26-01231
0/012310.o518
Priority = 0
qtime = Tue Feb 9 20:03:36 2021
Rerunable = True
Resource_List.ncpus = 2
Resource_List.walltime = 48:00:00
session_id = 112846
euser = flacscloud
egroup = flacscloud
queue_type = E
comment = Job started on Tue Feb 09 at 20:03
etime = Tue Feb 9 20:03:36 2021
submit_args = -N 012310 -d /shared/flacscloud/users/chris/D04F899F-43FC-41
9B-B8A7-15A8D3176A6F/Auriga/D26-012310/ -q batch -l ncpus=2 -l walltim
e=48:00:00 -F "/shared/flacscloud/run.py 517" /install/sw/flacs/20.2/F
LACS-CFD_20.2/bin/run_python
start_time = Tue Feb 9 20:03:36 2021
Walltime.Remaining = 89191
start_count = 1
fault_tolerant = False
job_radix = 0
submit_host = ip-172-31-24-41.eu-central-1.compute.internal
init_work_dir = /shared/flacscloud/users/chris/D04F899F-43FC-419B-B8A7-15A
8D3176A6F/Auriga/D26-012310
job_arguments = "/shared/flacscloud/run.py 517"
request_version = 1
[/code]

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions