/fsx Directory Is Getting Emptied Out After Post.Install Script

0

Hello Folks,

I'm following this tutorial found here:

https://containers-on-pcluster.workshop.aws/intro.html

In the tutorial, Spack is installed on the /fsx directory, which is a Lustre file mounting. After my cluster creation is complete and I login to the HeadNode, I see that there is nothing in the /fsx directory.

At first, I thought something was failing on the install, so I decided to run

echo "Echoing the spack contents again $(ls /fsx/spack)"

during the post.install.sh script. The results of this call show that there are contents in the directory.

Echoing the spack contents again
bin
CHANGELOG.md
COPYRIGHT
etc
lib

However, after logging in, I see that the directory is empty. What could be going on, and how do I make the files written during startup stick around?

Any help is greatly appreciated.

asked 2 years ago189 views
2 Answers
0

It looks like the /fsx is getting mounted after I've done all my writing to it. Below are the logs from running the command of

sudo grep -r "fsx" log/(asterisk)

Below is my code running which happens at 4:29

log/cfn-init-cmd.log:2021-10-27 04:29:35,350 P3658 [INFO] Directory /fsx exists.
log/cfn-init-cmd.log:2021-10-27 04:29:35,350 P3658 [INFO] Directory /fsx/spack exists.
log/cfn-init-cmd.log:2021-10-27 04:29:35,350 P3658 [INFO] Echoing the fsx contents spack
log/cfn-init-cmd.log:2021-10-27 04:29:35,351 P3658 [INFO] spack repo add /fsx/spack/var/spack/repos/cscs
log/cfn-init-cmd.log:2021-10-27 04:29:35,351 P3658 [INFO] Echoing the fsx contents again opt
log/cfn-init-cmd.log:2021-10-27 04:29:35,352 P3658 [INFO] Echoing the fsx contents again opt

And then it appears that /fsx gets created and mounted 1 second later

log/chef-client.log:Recipe: aws-parallelcluster::fsx_mount
log/chef-client.log: * directory[/fsx] action create[2021-10-27T04:29:59+00:00] INFO: Processing directory[/fsx] action create (aws-parallelcluster::fsx_mount line 27)
log/chef-client.log:[2021-10-27T04:29:59+00:00] INFO: directory[/fsx] mode changed to 1777
log/chef-client.log: * mount[/fsx] action mount[2021-10-27T04:29:59+00:00] INFO: Processing mount[/fsx] action mount (aws-parallelcluster::fsx_mount line 49)
log/chef-client.log:[2021-10-27T04:30:00+00:00] INFO: mount[/fsx] mounted
log/chef-client.log: - mount fs-0dde8922b7d8d0517.fsx.us-east-2.amazonaws.com@tcp:/fsx to /fsx
log/chef-client.log: * mount[/fsx] action enable[2021-10-27T04:30:00+00:00] INFO: Processing mount[/fsx] action enable (aws-parallelcluster::fsx_mount line 49)
log/chef-client.log:[2021-10-27T04:30:00+00:00] INFO: mount[/fsx] enabled
log/chef-client.log: - enable fs-0dde8922b7d8d0517.fsx.us-east-2.amazonaws.com@tcp:/fsx
log/chef-client.log: ** directory[/fsx] action create**[2021-10-27T04:30:00+00:00]* INFO: Processing directory[/fsx] action create (aws-parallelcluster::fsx_mount line 61)
log/chef-client.log:[2021-10-27T04:30:00+00:00] INFO: directory[/fsx] mode changed to 1777

I'm not sure if this 1 second makes a difference though.

answered 2 years ago
0

The issue is that my post.install.sh script was attached to the OnNodeStart hook. The trick was to run my script on the OnNodeConfigured hook.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions