[UBUNTU] EBS volume attachment during boot cause randomly EC2 instance to be stuck

0

We create and deploy custom AMIs based on Ubuntu Jammy and we noticed since jammy-20230428 that randomly all the AMI based on it sometimes fail during the boot process. I can destroy and deploy again to get rid of this. The stack trace is always the same:

[  849.765218] INFO: task swapper/0:1 blocked for more than 727 seconds.
[  849.774999]       Not tainted 5.19.0-1025-aws #26~22.04.1-Ubuntu
[  849.787081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  849.811223] task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00004000
[  849.883494] Call Trace:
[  849.891369]  <TASK>
[  849.899306]  __schedule+0x254/0x5a0
[  849.907878]  schedule+0x5d/0x100
[  849.917136]  io_schedule+0x46/0x80
[  849.970890]  blk_mq_get_tag+0x117/0x300
[  849.976136]  ? destroy_sched_domains_rcu+0x40/0x40
[  849.981442]  __blk_mq_alloc_requests+0xc4/0x1e0
[  849.986750]  blk_mq_get_new_requests+0xcc/0x190
[  849.992185]  blk_mq_submit_bio+0x1eb/0x450
[  850.070689]  __submit_bio+0xf6/0x190
[  850.075545]  submit_bio_noacct_nocheck+0xc2/0x120
[  850.080841]  submit_bio_noacct+0x209/0x560
[  850.085654]  submit_bio+0x40/0xf0
[  850.090361]  submit_bh_wbc+0x134/0x170
[  850.094905]  ll_rw_block+0xbc/0xd0
[  850.175198]  do_readahead.isra.0+0x126/0x1e0
[  850.183531]  jread+0xeb/0x100
[  850.189648]  do_one_pass+0xbb/0xb90
[  850.193917]  ? crypto_create_tfm_node+0x9a/0x120
[  850.207511]  ? crc_43+0x1e/0x1e
[  850.211887]  jbd2_journal_recover+0x8d/0x150
[  850.272927]  jbd2_journal_load+0x130/0x1f0
[  850.280601]  ext4_load_journal+0x271/0x5d0
[  850.288540]  __ext4_fill_super+0x2aa1/0x2e10
[  850.296290]  ? pointer+0x36f/0x500
[  850.304910]  ext4_fill_super+0xd3/0x280
[  850.372470]  ? ext4_fill_super+0xd3/0x280
[  850.380637]  get_tree_bdev+0x189/0x280
[  850.384398]  ? __ext4_fill_super+0x2e10/0x2e10
[  850.388490]  ext4_get_tree+0x15/0x20
[  850.392123]  vfs_get_tree+0x2a/0xd0
[  850.395859]  do_new_mount+0x184/0x2e0
[  850.468151]  path_mount+0x1f3/0x890
[  850.471804]  ? putname+0x5f/0x80
[  850.475341]  init_mount+0x5e/0x9f
[  850.478976]  do_mount_root+0x8d/0x124
[  850.482626]  mount_block_root+0xd8/0x1ea
[  850.486368]  mount_root+0x62/0x6e
[  850.568079]  prepare_namespace+0x13f/0x19e
[  850.571984]  kernel_init_freeable+0x120/0x139
[  850.575930]  ? rest_init+0xe0/0xe0
[  850.579511]  kernel_init+0x1b/0x170
[  850.583084]  ? rest_init+0xe0/0xe0
[  850.586642]  ret_from_fork+0x22/0x30
[  850.668205]  </TASK>

This happens since 5.19.0-1024-aws, I have now rolled back to 5.19.0-1022-aws. Is there anyone aware of this?

esysc
질문됨 일 년 전289회 조회
1개 답변
0

I was able to reproduce also with 5.19.0-1022-aws multiple times, IMHO it doesn't depend on the kernel version. Our instances are all t3 and t3.a type

esysc
답변함 10달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠