AmazonL2 kernel problem with software raid - mitigation



The latest kernel for Amazon Linux2 experiences problems when using software RAID, which results in a RAID accessing process to reach 100% CPU and to hang, eventually rendering the ec2 instance unresponsive.

My question is if there is any way to mitigate this issue as it does not happen with CentOS7, RedHat7 and RedHat8, it also doesn't occur with older AmznL2 kernel (4.14.77-81.59.amzn2.x86_64)?

Kernel: 4.14.232-177.418.amzn2.x86_64
OS: Amazon Linux 2

Steps to reproduce (devices might be different):
mdadm --create --verbose /dev/md0 --level=stripe --raid-devices=2 /dev/xvdb /dev/xvdc
mdadm --stop /dev/md0; mdadm --assemble /dev/md0 /dev/xvdb /dev/xvdc;

The "mdadm" process hangs when trying to assemble the raid, also any process trying to access this raid. This happen when the "/dev/md0" is cleared, but "/sys/block/md0" is left.

Best regards and thank you in advance,

Edited by: rga on Jul 12, 2021 11:14 AM

asked 3 years ago225 views
1 Answer

The latest tested kernel (4.14.238-182.422.amzn2.x86_64) of Amazon Linux 2 does not show the mentioned "mdadm" assembly problem.

Edited by: rga on Jul 27, 2021 7:53 AM

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions