- Newest
- Most votes
- Most comments
As far as I understand your issue:
LIMIT clause is confusing the Redshift execution plan regarding how to truncate the data stream coming back from the Lambda provider. Using a CTE is the most reliable workaround.
So, the issue likely stems from a buffer deadlock caused by the interaction between your high MAX_BATCH_ROWS and the LIMIT 200 clause, combined with calling the function twice in one row.
When you call LIMIT 200 with a MAX_BATCH_ROWS of 100,000, Redshift may still send a full batch of 100,000 rows to Lambda. While Lambda finishes processing, the Redshift Leader Node is only waiting for 200 rows to send to the client. The remaining 99,800 rows in that batch can "clog" the communication pipe between the compute nodes and the leader, leading to the infinite "Returning" state.
Check the following:
- Materialize via CTE: Wrap the function call in a CTE or Subquery. This forces Redshift to handle the UDF execution logic separately from the result-streaming logic of the LIMIT clause.
WITH processed_data AS (
SELECT col_name, fn_lambda_anonymize(col_name, 'secret') as anon_col
FROM big_table
)
SELECT col_name, anon_col, anon_col
FROM processed_data
LIMIT 200;
- Align Batch Size: Temporarily lower
MAX_BATCH_ROWSfor interactive queries with small limits. If you only want 200 rows, a batch size of 100,000 is overkill and increases the risk of the "Return" hang. - Avoid Redundant Calls: In your query, you call the function twice for the same column. Even if Redshift tries to optimize, it often generates two distinct Lambda requests per row. Use a CTE (as shown above) to call the function once and reference the alias twice.
- Check Result Cache: Disable the result cache (
SET enable_result_cache_for_system_queries TO off;) for the session to see if a corrupted cache entry is holding the "Returning" state hostage.
Tip: If you frequently run interactive queries with LIMIT, consider creating a separate version of the External Function with a much lower MAX_BATCH_ROWS (e.g., 1,000) specifically for development and testing to avoid these serverless compute "hangs“.
According to AWS Best Practices for Lambda UDFs (2026) and the Redshift Developer Guide, large batch settings (MAX_BATCH_ROWS) can cause synchronization issues with the Leader Node when combined with LIMIT clauses. Forcing materialization via a CTE is a documented strategy to decouple UDF execution from result streaming.
see also:
Based on the behavior you're describing, where your query hangs in the "returning" state after the Lambda UDF executions complete, this appears to be an issue with how Amazon Redshift is handling the result set assembly after the Lambda function calls finish.
A few things to consider and try:
Regarding the RETRY_TIMEOUT setting: Your current RETRY_TIMEOUT of 60000 milliseconds (60 seconds) controls how long Amazon Redshift will retry Lambda requests if they receive exceptions. However, this doesn't control the overall query timeout. You may want to set a statement_timeout to prevent queries from running indefinitely. The statement_timeout determines the maximum time a query can run (including planning, queueing, and execution) before Amazon Redshift terminates it. For example, you could set SET statement_timeout = 300000; (5 minutes) before running your query to ensure it doesn't hang indefinitely.
Batch size considerations: You mentioned that different query patterns result in different batch sizes (100,000 vs 300 rows). The fact that your LIMIT 200 query hangs while larger operations succeed suggests there may be an issue with how smaller result sets are being processed. Try adjusting the MAX_BATCH_ROWS parameter - perhaps setting it lower (like 1000 or 5000) to see if that changes the behavior.
Monitoring Lambda UDF execution: Since you can see the Lambda executions completing successfully in SYS_LUDF_DETAIL, the issue appears to be in the post-processing phase. Check the total_duration, invocations, and total_rows columns in SYS_LUDF_DETAIL to understand if there's something unusual about how the function is being called in the problematic query pattern.
Workaround suggestions:
- Try breaking your query into smaller chunks using WHERE clauses instead of LIMIT
- Consider using a CREATE TABLE AS approach (which you mentioned works) and then selecting from that derived table
- Test if the issue persists when calling the Lambda UDF only once per row instead of twice in the same SELECT clause
Since this is affecting Redshift Serverless specifically, you may want to test on a provisioned cluster to determine if this is a serverless-specific issue that should be reported to AWS Support.
Sources
statement_timeout - Amazon Redshift
SYS_LUDF_DETAIL - Amazon Redshift
CREATE EXTERNAL FUNCTION - Amazon Redshift
Relevant content
- asked 2 years ago
- asked 2 years ago
- asked 10 months ago
- AWS OFFICIALUpdated 3 years ago

If my answer helped solve your problem, I would appreciate it if you click on “accepted answer”