Questions tagged with High Performance Compute
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
We recently switched our Redshift cluster from 14 x dc2.large to 4 x ra3.xlplus as we really liked the idea to have the storage decoupled from the nodes.
I expected some kind of change in performance...
3
answers
0
votes
3771
views
asked 3 years agolg...
Do I have to redownload dataset to training job every time I run a Sagemaker Estimator training job?lg...
Hi,
Over the coming weeks I'll be running some deep learning experiments using the PyTorch Sagemaker estimator, and I was wondering if it would be possible to avoid re-downloading my dataset every...
1
answers
0
votes
808
views
asked 3 years agolg...
sview with pcluster3lg...
Anyone have any suggestions on using the slurm-gui ( sview) with pcluster3 ?
Problem: We have a modeller that wants to use slurm-gui to watch the slurm queue. I can install slurm-gui with yum...
3
answers
0
votes
287
views
asked 3 years agolg...
Hello Folks,
I'm following this tutorial found here:
<https://containers-on-pcluster.workshop.aws/intro.html>
In the tutorial, Spack is installed on the **/fsx** directory, which is a...
2
answers
0
votes
218
views
asked 3 years agolg...
Hello Folks,
I'm following along with one of the HPC PCluster Gromacs workshops. I've made the modifications to get the example to work with PClusterv3 (I think). However, when I issue the...
2
answers
0
votes
364
views
asked 3 years agolg...
Hello, I am a new parallelCluster 2.11 user and am having an issue where my compute nodes fail to spin up properly resulting in the eventual failure of pcluster create. Here is my config file:
Note:...
2
answers
0
votes
467
views
asked 3 years agolg...
During testing slurm with parallelcluster, I found we only support independent spot instance.
I am wondering whether we can support spot fleet, which we can leverage aws spot fleet feature to help...
3
answers
0
votes
305
views
asked 3 years agolg...
Hi,
I have a cluster configured with ParallelCluster 2.10 that has been for over half a year now. It has two ebs resources mounted /shared and /install. It seems that both the ebs snapshots...
Accepted AnswerHigh Performance Compute
2
answers
0
votes
340
views
asked 3 years agolg...
OS
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
Kernel
5.4.0-1045-aws
efadv_query_device showed...
8
answers
0
votes
522
views
asked 3 years agolg...
Hi,
I noticed strange behavior of my cluster. I am using torque on centos 8. The cluster was working fine for over 2 months and all of the sudden the compute nodes stopped running queued jobs. I...
2
answers
0
votes
351
views
asked 3 years agolg...
Hello,
I noticed that nodes in my cluster tend to overcommit and are overloaded running more torque jobs than the number of available CPUs. I suspect it may be related to the torque configuration...
Accepted AnswerHigh Performance Compute
3
answers
0
votes
184
views
asked 3 years agolg...
I have an application where I need to use pcluster to initialize a master server which will have several accounts for my coworkers to login. This server must run uninterupted (can't be taken down to...
Accepted AnswerHigh Performance Compute
6
answers
0
votes
360
views
asked 3 years agolg...