`RequestTimeout`s for S3 put requests from a Lambda in a VPC for larger payloads

0

Update

I added a VPC gateway endpoint for S3 in the same region (US East 1). I selected the route table for it that the lambda uses. But still, the bug persists. Below I've included details regarding my network configuration. The lambda is located in the "api" subnet.

Network Configuration

1 VPC
4 subnets:

  • public
    IPv4 CIDR: 10.0.0.0/24
    route table: public
    Network ACL: public
  • private
    IPv4 CIDR: 10.0.1.0/24
    route table: private
    Network ACL: private
  • api
    IPv4 CIDR: 10.0.4.0/24
    route table: api
    Network ACL: api
  • private2-required
    IPv4 CIDR: 10.0.2.0/24
    route table: public
    Network ACL: -

3 route tables:

  • public
    Destination: 10.0.0.0/16     Target: local
    Destination: 0.0.0.0/0     Target: igw-xxxxxxx
    Destination: ::/0     Target: igw-xxxxxxxx
  • private
    Destination: 10.0.0.0/16     Target: local
  • api
    Destination: 10.0.0.0/16     Target: local
    Destination: 0.0.0.0/0     Target: nat-xxxxxxxx
    Destination: pl-xxxxxxxx     Target: vpce-xxxxxxxx
    (VPC S3 endpoint)

4 network ACLs

  • public
    inbound rules:
    All traffic (allow)
    outbound rules:
    All traffic (allow)
  • private
    inbound rules:
    100: PostgreSQL TCP 5432 10.0.0.48/32 (allow)
    101: PostgreSQL TCP 5432 10.0.4.0/24 (allow)
    outbound rules:
    100: Custom TCP TCP 32768-65535 10.0.0.48/32 (allow)
    101: Custom TCP TCP 1024-65535 10.0.4.0/24 (allow)
  • api
    inbound rules:
    All traffic (allow)
    outbound rules:
    All traffic (allow)
  • -
    inbound rules:
    All traffic (allow)
    outbound rules:
    All traffic (allow)

Update

I increased the timeout of the lambda to 5 minutes, and the timeout of the PUT request to the S3 bucket to 5 minutes as well. Before this the request itself would timeout, but now I'm actually getting a response back from S3. It is a 400 Bad Request response. The error code is RequestTimeout. And the message in the payload of the response is "Your socket connection to the server was not read from or written to within the timeout period."

This exact same code works 100% of the time for a small payload (on the order of 1KB), but apparently for payloads on the order of 1MB it starts breaking. There is no logic in my code that does anything differently based on the size of the payload. I've read similar issues that suggest the issue is with the wrong number of bytes being provided in the "content-length" header, but I've never provided a value for that header. Furthermore, the lambda works flawlessly when executed in my local environment. The problem definitely appears to be a networking one. At first glance it might seem like this is just an issue with the lambda being able to interact with services outside of the VPC, but that's not the case because the lambda does work exactly as expected for smaller file sizes (<1KB). So it's not that it flat out can't communicate with S3.

Scratching my head here...

Original

I use S3 to host images for an application. In my local testing environment the images upload at an acceptable speed. However, when I run the same exact code from an AWS Lambda (in my VPC), the speeds are untenably slow. I've concluded this because I've tested with smaller images (< 1KB) and they work 100% of the time without making any changes to the code. Then I use 1MB sized payloads and they fail 98% percent of the time. I know the request to S3 is the issue because of logs made from within the Lambda that indicate the execution reaches the upload request, but — almost — never successfully passes it (times out).

  • What's the networking configuration in the VPC? Also, what memory size and timeout have you configured for Lambda?

  • I updated the question. Currently the lambda has a 5 minute timeout and 512MB ram, but I've attempted with the max of 10240MB ram and 5 minute timeout, and still no difference.

  • I ask about the networking configuration of the VPC because it might be something there that is affecting the transfer. Try creating a S3 Gateway Endpoint in the VPC - that means that the traffic to S3 from the Lambda function doesn't traverse any NAT or Internet Gateways.

  • Tried the S3 Gateway Endpoint option, but still facing the issue. I updated the question to include details on my network configuration.

1 Answer
0

Thanks for all the detail - I can't see anything there that would be a problem but needed to check.

You don't say what runtime language you're using in Lambda. I'm going to guess that you're using Node.js. If that's the case, you may be using an asynchronous function to transfer the data to S3. That may be the problem here - for small files, the async function completes before the Lambda function is suspended and therefore it works.

But for larger files, the the transfer function may still be running but hasn't returned yet; but the mainline code has completed. The Lambda control plane doesn't know this so will suspend the function. The trick here is to ensure that the transfer has completed before the mainline code exits.

Of course, I'm guessing. But it seems to fit the circumstances.

profile pictureAWS
EXPERT
answered 2 years ago
  • Thanks for your response. There may be a point there. Unfortunately, it wouldn't be as straightforward though. I'm using Swift for the runtime language. Furthermore, the function that transfers the data is performed via a HTTP request, and the default request processing logic — and perhaps all — provided by the language is asynchronous. However, in my case I faced the problem you mentioned in your answer beforehand so I made the function synchronous by using a semaphore.

  • This appeared to resolve all issues where the Lambda was completing execution before all the endpoint's logic had been executed. I could place a log statement in it, this time specifically looking for it being printed to the console before the log statement that is ran on the lambda's completion. However, I did already litter the Lambda with log statements in the past, particularly in the HTTP request we're talking about here, and suspect I would've noticed if that were the case; but maybe not.

  • I may just bite the bullet in this case and use a second Lambda just for dealing with S3. A bit more straightforward than trying to debug this problem when I have very limited access to the hardware experiencing the issue, and the only error I'm getting is a "RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period".

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions