2 Answers
- Newest
- Most votes
- Most comments
1
HI @blakem,
I can confirm the first issue is due to lack of GPU in the head node. To experiment within one of the compute nodes you can submit a job, retrieve the node hostname and then when the job is Running connect to the node with SSH:
[ec2-user@ip-10-0-0-33 ~]$ sbatch --wrap "sleep 100"
Submitted batch job 1
[ec2-user@ip-10-0-0-33 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1 queue1 wrap ec2-user R 0:03 1 queue1-dy-queue1-t2medium-1
[ec2-user@ip-10-0-0-33 ~]$ ssh queue1-dy-queue1-t2medium-1
Once in the compute node you can try to manually install the package on it.
If it works as expected you can automate the installation by using OnNodeConfigured
custom bootstrap action: https://docs.aws.amazon.com/parallelcluster/latest/ug/custom-bootstrap-actions-v3.html
Enrico
answered a year ago
0
@enrico-aws, thanks for the quick turnaround and suggestion. I had to wait for 10+ minutes for my GPU node to finish initializing, but once it was running I was able to log into the GPU node.
answered a year ago
Relevant content
- Accepted Answerasked a year ago
- asked a year ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 8 months ago