By using AWS re:Post, you agree to the Terms of Use

Linux OS networking bug in Elastic Beanstalk AMI with Tomcat & Corretto

2

We use AWS Elastic Beanstalk with an Amazon AMI with Tomcat & Corretto running on Amazon Linux 2 (aws-elasticbeanstalk-amzn-2.0.20220316.64bit-eb_tomcat85corretto8_amazon_linux_2-hvm-2022-03-29T20-48) and are running into an OS networking bug when Tomcat is under load. The result of this bug are that TCP connections from clients connect but timeout while the server is under load.

The networking bug is due to a race condition in the TCP stack which is fixed in Linux 5.10 kernels. A description and diff of the bug can be found in this commit. From the description of this bug it looks like this race condition affects all TCP networking and is not specific to Tomcat, but manifests more often under load.

Currently, as far as I can tell, all the latest Amazon AMIs for Elastic Beanstalk for Tomcat or Corretto are using a 4.14 kernel. The AMI which we are using has a kernel of 4.14.268-205.500.amzn2.x86_64. I have been able to reproduce the bug on this AMI using the sample server code in the Ubuntu bug report, which is independent of Tomcat.

I have also tried reproing the bug on newer versions of Amazon Linux 2 (AMI amzn2-ami-kernel-5.10-hvm-2.0.20220419.0-x86_64-gp2) which are using a 5.10.109-104.500.amzn2.x86_64 kernel, but have not been able to repro the bug on this kernel.

We would prefer not to have to create our own AMI for using Elastic Beanstalk, but were wondering if and when there will be an update to the Amazon Elastic Beanstalk AMI's which incorporate this OS bug fix since this is affecting the reliability of networking under load?