AWS Lambda Function Timing Out While Parsing JSON from Paginated API Calls

0

Hello everyone,

I am struggling with an AWS Lambda function that times out when attempting to parse a JSON response from paginated API calls. The function is designed to handle asynchronous API calls to fetch event data from a paginated API, updating a cursor and making subsequent requests until all pages have been retrieved. While it runs perfectly in a local environment, it consistently times out on AWS Lambda during the JSON parsing step after the first successful request.

Detailed Issue: The function successfully makes the HTTP request and receives a response with status 200, which confirms that the server has processed the request correctly. However, the function times out when attempting to parse the JSON from the response. This suggests an issue with handling the size or complexity of the JSON data.

Here’s the relevant portion of the function and the latest logs, including the fetch_with_exponential_backoff method that adds robustness to our network requests:

Code Snippet from fetch_events Function:

logger.info("Start of the request to URL: [URL redacted for privacy]")

while True:  # Continue until there are no more pages to fetch
    logger.info("Request to page: {current_page}/{page_max}")
    if cursor:  # If a cursor is present, it's added to the parameters
        params["cursor"] = cursor
        logger.info("Cursor: {cursor}")

    try:
        logger.info("Entering Try")
        data = await fetch_with_exponential_backoff(session, url, params, headers)
        logger.info("Successfully received response, starting JSON parsing")
    except Exception as e:
        logger.error(f"Error while retrieving or parsing data: {str(e)}\n{traceback.format_exc()}")
        return None

Code Snippet from fetch_with_exponential_backoff Function:

async def fetch_with_exponential_backoff(session, url, params, headers, max_retries=5):
    retry_delay = 1  # Start with a 1 second delay
    for attempt in range(max_retries):
        try:
            async with session.get(url, params=params, headers=headers) as response:
                if response.status == 200:
                    logger.info(f"Response received with status 200. Headers: {response.headers}")
                    content_length = response.headers.get('Content-Length')
                    if content_length:
                        logger.info(f"Content length: {content_length}")
                    data = await response.json()
                    logger.info("Successfully parsed JSON")
                    return data
                else:
                    logger.error(f"Failed request with status {response.status}, attempt {attempt + 1}")
        except Exception as e:
            logger.error(f"Exception during request: {e}, attempt {attempt + 1}")
        await asyncio.sleep(retry_delay)
        retry_delay *= 2  # Double the retry delay on each attempt

EDIT : I've added chunks on fetch_with_exponential_backoff but still get timedout

                    # Process each chunk as it comes in
                    parts = []
                    async for chunk in response.content.iter_chunked(1024):  # Adjust chunk size as needed
                        parts.append(chunk)
                    data = json.loads(b''.join(parts))
                    logger.info("Successfully parsed JSON")
                    return data

Function Logs Translated and Redacted:

START RequestId: [RequestId redacted] Version: $LATEST
[INFO] [Timestamp redacted] Start of the request to URL: [URL redacted for privacy]
[INFO] [Timestamp redacted] Request to page: 0/None
[INFO] [Timestamp redacted] Entering Try
[INFO] [Timestamp redacted] Response received with status 200. Headers: 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json; charset=utf-8', 'Date': 'Thu, 02 May 2024 17:20:31 GMT', 'Etag': '"', 'Server': 'Vercel', 'Strict-Transport-Security': 'max-age=63072000', 'X-Vercel-Cache': 'MISS', 'X-Vercel-Id': xxxxxxxx, 'Transfer-Encoding': 'chunked')>
[INFO] [Timestamp redacted] Task timed out after 60.07 seconds

Steps Taken So Far:

  • Increased the timeout and memory allocation of the Lambda function.
  • Added detailed logging to monitor the size and headers of the response.

Questions:

  • Could the size or complexity of the JSON be causing these timeouts?
  • Are there more efficient ways to handle large JSON responses in Lambda?

Any insights or suggestions would be greatly appreciated. Thank you!

gb
asked 15 days ago107 views
2 Answers
1
Accepted Answer

Not answering the question, but why call the remote API asynchronously? I get there may be other things happening in the function in parallel but if that is causing you issues in a Lambda environment, why not call the API synchronously?

profile pictureAWS
EXPERT
answered 14 days ago
  • It worked like that on my other lambda. I'll try to use urllib3 to see if that change.

  • Well, I've used urllib3 and it's working perfectly! After days of debugging, thank you !!!!

0

I don't see any exit condition for while loop in your fetch_events code snippet. You should have some logic there to check the data to see if another query is required. If not, exit the loop. The way its set up, it will never end unless an exception occurs or lambda times out.

AWS
answered 15 days ago
  • Hello, there's multiple on the coede, i've only put a snippet. It break when there's no more page on the API. It's working well locally.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions