- 最新
- 最多得票
- 最多評論
Ok, with AWS support's help, turns out I had to update the kernel then all good.
You're running Rocky Linux 8.7 - according to https://docs.aws.amazon.com/fsx/latest/LustreGuide/prerequisites.html FSx for Lustre is only supported up to 8.6 (whether it's RHEL, CentOS or Rocky):
CentOS and Red Hat Enterprise Linux 7.5 through 7.9 and 8.2 through 8.6, Rocky Linux 8.4 through 8.6
Could you try it with an older version of the AMI and see if it makes a difference?
Hi,
A 'soft lockup' is defined as a bug that causes the kernel to loop in kernel mode for more than 20 seconds without giving other tasks a chance to run. The watchdog daemon will send an non-maskable interrupt (NMI) to all CPUs in the system who, in turn, print the stack traces of their currently running tasks.
Given the end of your error message: this problem seems to be reported for PMDA process. See https://manpages.ubuntu.com/manpages/focal/man1/pmdaproc.1.html
You may want to try to stop this process. But, no guarantee: PMDA may just be the detector of of the lockup not its cause...
You may also want to open an AWS Support ticket.
Best,
Didier
Hi Didier, thanks for that explanation. I indeed learned the same about "soft lockups" googling for an answer. Reasonably certain it's caused by the FSx client install. Alas, I can't open an AWS Support ticket since I'm not on a support plan.
Hi RWC, ok. Maybe the Rocky Linux community can help?
Have posted over on the Rocky support forum. Thanks for the idea Didier
相關內容
- 已提問 10 個月前
- AWS 官方已更新 1 年前
- AWS 官方已更新 3 年前
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前
Hi RWC, oh, nice spot. That conflicts with the compatibility matrix in the first link I sent which says it should be compatible with 8.7. Hmm. Ok, will take some work to re-do my environment with 8.6 but will give it a go. Thanks for the tip.
No worries. And if you get the same outcome on Rocky 8.6 (and my hunch is that you could well do) then give it a try on RHEL 8.6 - as Rocky claims to be 100% bug-for-bug compatible with RHEL you should get the same outcome, whether good or bad.
If you get the same problem with RHEL, the fact you're paying for it (even just a few cents) gives you the option of logging a support call through AWS Premium Support, who in turn will engage Red Hat on your behalf https://aws.amazon.com/partners/redhat/faqs/#Support Between the two of them you should get a resolution.
Hey Steve,
My contact at AWS says 8.7 is compatible. It's working for him, but still not for me. I took your suggestion and tried a RHEL 8.7 AMI and that's also not working, but in a slightly different way:
[ec2-user@ip-172-31-42-103 ~]$ sudo mount -t lustre -o noatime,flock fs-00b4d8fbf28ff3fa7.fsx.ap-southeast-1.amazonaws.com@tcp:/b3kplbmv /mnt/fsx mount.lustre: mount fs-00b4d8fbf28ff3fa7.fsx.ap-southeast-1.amazonaws.com@tcp:/b3kplbmv at /mnt/fsx failed: No such device Are the lustre modules loaded? Check /etc/modprobe.conf and /proc/filesystems
Then I tried:
lsmod | grep lustre
That returns nothing. Then I tried:
[ec2-user@ip-172-31-42-103 ~]$ sudo modprobe lustre modprobe: ERROR: could not insert 'lustre': Device or resource busy
And now I'm a bit out of my depth.
I couldn't see how to lodge a support ticket with AWS at that link even though it does seem like support should be included with RHEL. Hmm.