Receiving socket connection reset in one of our ECS environments when attempting to download a file from S3

0

I have a spring batch application that is expected to read large csvs (200mb each in size) from AWS S3 and write the data into our database. We updated our process slightly to accommodate a new csv schema. When running this new import process using LocalStack, everything works flawlessly. As soon as we introduce a real network, we start having issues, but only in one of our ECS environments (prod). The process failed once in a lower environment and then succeeded every subsequent run. We get the following exception when pulling down a large file from S3:

java.net.SocketException: Connection reset
    at sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:328)
    at sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
    at sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
    at java.net.Socket$SocketInputStream.read(Socket.java:966)
    at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:484)
    at sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:467)
    at sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243)
    at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
    at sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
    at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1513)
    at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1484)
    at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1069)
    at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
    at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:197)
    at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176)
    at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188)
    at java.io.InputStreamReader.read(InputStreamReader.java:177)
    at java.io.BufferedReader.fill(BufferedReader.java:162)
    at java.io.BufferedReader.readLine(BufferedReader.java:329)
    at java.io.BufferedReader.readLine(BufferedReader.java:396)
    at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:207)
    ... 33 common frames omitted
Wrapped by: org.springframework.batch.item.file.NonTransientFlatFileException: Unable to read from resource: [Amazon s3 resource [bucket='bucket-name-here' and object='import/test.csv']]
    at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:226)
    at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:178)
    at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:93)
    at org.springframework.batch.item.support.SynchronizedItemStreamReader.read(SynchronizedItemStreamReader.java:57)
    at org.springframework.batch.item.support.SynchronizedItemStreamReader$$FastClassBySpringCGLIB$$987ea09.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:137)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:124)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
    at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708)
    at org.springframework.batch.item.support.SynchronizedItemStreamReader$$EnhancerBySpringCGLIB$$2a78c69a.read(<generated>)
    at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:99)
    at org.springframework.batch.core.step.item.FaultTolerantChunkProvider.read(FaultTolerantChunkProvider.java:87)
    ... 17 common frames omitted
Wrapped by: org.springframework.batch.core.step.skip.NonSkippableReadException: Non-skippable exception during read
    at org.springframework.batch.core.step.item.FaultTolerantChunkProvider.read(FaultTolerantChunkProvider.java:105)
    at org.springframework.batch.core.step.item.SimpleChunkProvider$1.doInIteration(SimpleChunkProvider.java:126)
    at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:375)
    at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
    at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:145)
    at org.springframework.batch.core.step.item.SimpleChunkProvider.provide(SimpleChunkProvider.java:118)
    at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:71)
    at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:407)
    at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:331)
    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
    at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:273)
    at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:82)
    at org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate$ExecutingRunnable.run(TaskExecutorRepeatTemplate.java:262)
    at org.springframework.security.concurrent.DelegatingSecurityContextRunnable.run(DelegatingSecurityContextRunnable.java:82)
    at io.github.jhipster.async.ExceptionHandlingAsyncTaskExecutor.lambda$createWrappedRunnable$1(ExceptionHandlingAsyncTaskExecutor.java:78)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.lang.Thread.run(Thread.java:840)

This has never been a problem before and we have run this process multiple times over the past year with no issues until now. We are suspecting this to be an issue with the network AWS ECS or some sort of timeout being imposed on the AWS side. We have checked timeouts on our side and everything runs just fine in other ECS environments. Not sure where to go from here. Any help would be appreciated!

EDIT: Additionally, it seems to be failing/dropping connection after about 8min every single time. We have attempted to run this 4 times now in prod and all 4 times, it has thrown the first connection reset exception after 8min. Like I mentioned earlier, we have thoroughly tested our defined timeouts within our application by lowering them dramatically and testing locally and none of the timeouts were a problem.

d-viso
已提問 6 個月前檢視次數 225 次
1 個回答
0

Hello,

This could be due to many reasons you may consider visiting this AWS blog[1] which walks you through troubleshooting steps for it 'How do I troubleshoot ConnectionReset errors when I upload or download objects from Amazon S3?'.

According to the documentation[1] there are several other timeouts which possibly related with this condition. You may try to set withRequestTimeout or withClientExecutionTimeout.

[1] https://repost.aws/knowledge-center/s3-connect-reset-error [2] https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html

AWS
sanju_s
已回答 6 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南