[UBUNTU] EBS volume attachment during boot cause randomly EC2 instance to be stuck

0

We create and deploy custom AMIs based on Ubuntu Jammy and we noticed since jammy-20230428 that randomly all the AMI based on it sometimes fail during the boot process. I can destroy and deploy again to get rid of this. The stack trace is always the same:

[  849.765218] INFO: task swapper/0:1 blocked for more than 727 seconds.
[  849.774999]       Not tainted 5.19.0-1025-aws #26~22.04.1-Ubuntu
[  849.787081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  849.811223] task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00004000
[  849.883494] Call Trace:
[  849.891369]  <TASK>
[  849.899306]  __schedule+0x254/0x5a0
[  849.907878]  schedule+0x5d/0x100
[  849.917136]  io_schedule+0x46/0x80
[  849.970890]  blk_mq_get_tag+0x117/0x300
[  849.976136]  ? destroy_sched_domains_rcu+0x40/0x40
[  849.981442]  __blk_mq_alloc_requests+0xc4/0x1e0
[  849.986750]  blk_mq_get_new_requests+0xcc/0x190
[  849.992185]  blk_mq_submit_bio+0x1eb/0x450
[  850.070689]  __submit_bio+0xf6/0x190
[  850.075545]  submit_bio_noacct_nocheck+0xc2/0x120
[  850.080841]  submit_bio_noacct+0x209/0x560
[  850.085654]  submit_bio+0x40/0xf0
[  850.090361]  submit_bh_wbc+0x134/0x170
[  850.094905]  ll_rw_block+0xbc/0xd0
[  850.175198]  do_readahead.isra.0+0x126/0x1e0
[  850.183531]  jread+0xeb/0x100
[  850.189648]  do_one_pass+0xbb/0xb90
[  850.193917]  ? crypto_create_tfm_node+0x9a/0x120
[  850.207511]  ? crc_43+0x1e/0x1e
[  850.211887]  jbd2_journal_recover+0x8d/0x150
[  850.272927]  jbd2_journal_load+0x130/0x1f0
[  850.280601]  ext4_load_journal+0x271/0x5d0
[  850.288540]  __ext4_fill_super+0x2aa1/0x2e10
[  850.296290]  ? pointer+0x36f/0x500
[  850.304910]  ext4_fill_super+0xd3/0x280
[  850.372470]  ? ext4_fill_super+0xd3/0x280
[  850.380637]  get_tree_bdev+0x189/0x280
[  850.384398]  ? __ext4_fill_super+0x2e10/0x2e10
[  850.388490]  ext4_get_tree+0x15/0x20
[  850.392123]  vfs_get_tree+0x2a/0xd0
[  850.395859]  do_new_mount+0x184/0x2e0
[  850.468151]  path_mount+0x1f3/0x890
[  850.471804]  ? putname+0x5f/0x80
[  850.475341]  init_mount+0x5e/0x9f
[  850.478976]  do_mount_root+0x8d/0x124
[  850.482626]  mount_block_root+0xd8/0x1ea
[  850.486368]  mount_root+0x62/0x6e
[  850.568079]  prepare_namespace+0x13f/0x19e
[  850.571984]  kernel_init_freeable+0x120/0x139
[  850.575930]  ? rest_init+0xe0/0xe0
[  850.579511]  kernel_init+0x1b/0x170
[  850.583084]  ? rest_init+0xe0/0xe0
[  850.586642]  ret_from_fork+0x22/0x30
[  850.668205]  </TASK>

This happens since 5.19.0-1024-aws, I have now rolled back to 5.19.0-1022-aws. Is there anyone aware of this?

esysc
asked a year ago283 views
1 Answer
0

I was able to reproduce also with 5.19.0-1022-aws multiple times, IMHO it doesn't depend on the kernel version. Our instances are all t3 and t3.a type

esysc
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions