Lambda response streaming using Java and Auth using API GW+Cognito Identity Pool

0

We want to use LLM/GenAI in our project, but some of the GenAI use cases are asynchronous but some are synchronous.

The synchronous use cases where it is directly interacting with users, the latency of getting all the response from LLM and showing it to users takes a lot of time, so we want to stream the response from LLM to UI.

We use Lambda in our backend, so the flow goes like this: the user types a query from the front end in the browser, it goes to Lambda, then to LLM, and LLM streams the response back to Lambda. Lambda then needs to send that response back to the UI.

Since we are using Java/Kotlin in the backend Lambda, we could not find any good reference articles, demos, or sample code that explains how to stream the response back to the front end using Java. The only example we found online is using JS.

We also read about the limitation in API Gateway for responses. We host all Lambdas behind API Gateway. One mitigation published in online docs is to use the Lambda function URL, but we are not aware of how the function URL works with authorization and authentication from front-end apps with Cognito Identity Pool.

Any sample code reference, demos, or workshop links will be super helpful on this topic.

2 Answers
0

Lambda response streaming is currently not supported in all runtimes, according to the documentation:

Currently, Lambda supports response streaming only on Node.js 14.x, Node.js 16.x, and Node.js 18.x managed runtimes. You can also use a custom runtime with a custom Runtime API integration to stream responses or use the Lambda Web Adapter. You can stream responses through Lambda Function URLs, the AWS SDK, or using the Lambda InvokeWithResponseStream API.

Also, with API Gateway, you'll need specific configurations. Details are described in the following blog:

Neither API Gateway nor Lambda’s target integration with Application Load Balancer support chunked transfer encoding. It therefore does not support faster TTFB for streamed responses. You can, however, use response streaming with API Gateway to return larger payload responses, up to API Gateway’s 10 MB limit. To implement this, you must configure an HTTP_PROXY integration between your API Gateway and a Lambda function URL, instead of using the LAMBDA_PROXY integration.

That being said, examples which could be useful for you are most probably around custom runtimes and HTTP_PROXY configurations, assuming that using Node.js is not an option. The following sources might be useful for you:

In general, I recommend to also look into asynchronous invocations of AWS Lambda. Since the issue you are solving is long-running tasks in the LLM, you potentially don't need response streaming but simply an asynchronous invocation model. That setup could potentially be more aligned with your current architecture, especially regarding the runtimes used. Serverlessland maintains an end-to-end example for this pattern on Github.

profile pictureAWS
Michael
answered 10 months ago
0

As mentioned in the previous answer, aws-lambda-web-adapter can do this. Here is a response streaming exapmle for Springboot. The demo streams the content of a file, but you can stream any text as well.

AWS
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions