Skip to content

Best practice for long-running async jobs: SQS → EventBridge Pipes → ECS RunTask, and how to pass payload?

0

Hi,

We need advice on the best-practice AWS architecture for a long-running async process.

Our application stack:

  • Frontend: Vite React, hosted on S3 static hosting + CloudFront
  • Backend: Python FastAPI, running on ECS Fargate behind an ALB, inside a VPC
  • Database: MongoDB
  • Long-running process: RAG vector creation. This can take several minutes, so we cannot keep the original HTTP request open.

Current desired behavior:

  1. User clicks “Create RAG” in the frontend.
  2. Frontend calls backend API: POST /api/rag/create.
  3. Backend immediately returns something like “queued”.
  4. The long RAG process runs asynchronously.
  5. Multiple users should be able to start RAG creation at the same time.
  6. Each RAG job should run isolated from the others, ideally one ECS/Fargate task per job.
  7. The frontend polls job status from the backend/database.

Architecture we tried:

Backend API → SQS queue → EventBridge Pipe → ECS RunTask → dedicated ECS task definition for the RAG worker

Details:

  • The backend sends an SQS message with a body like: { "user_id": "...", "job_type": "rag_create" }

  • EventBridge Pipes uses the SQS queue as source.

  • The Pipe target is ECS RunTask.

  • The ECS task uses the same Docker image as our backend, but a different task definition/command.

  • The worker command is something like: python -m app.application.jobs.rag_worker --mode single

What works:

  • Backend successfully sends messages to SQS.
  • EventBridge Pipe consumes messages very quickly.
  • EventBridge Pipe starts ECS Fargate tasks successfully.
  • The worker container starts and connects to MongoDB.
  • The general async flow is fast and close to what we want.

The blocker:

We need the ECS worker task to receive the original SQS message body, for example user_id/job_type.

We tried passing the SQS body into the ECS task through EventBridge Pipe target parameters / container overrides, for example:

  • environment variable RAG_JOB_PAYLOAD
  • command argument
  • input_template
  • JSONPath references like $.body or <$.body>

But we could not get the actual SQS message body into the ECS container reliably. In some cases the literal string was passed, in other cases the task stopped starting.

Example from aws pipes describe-pipe showed that the command override was stored literally:

"Command": [ "python", "-m", "app.application.jobs.rag_worker", "--mode", "single", "--payload", "<$.body>" ]

So it did not become the actual SQS body.

Questions:

  1. Is this architecture a recommended pattern for long-running jobs? Specifically: Backend API → SQS → EventBridge Pipes → ECS RunTask

  2. If this is a good pattern, what is the correct supported way to pass the original SQS message body into the ECS task started by EventBridge Pipes?

  3. Is EventBridge Pipes expected to support dynamic SQS body values inside ECS container overrides, such as environment variables or command arguments?

  4. If direct Pipes → ECS payload passing is not supported or not recommended, what is the preferred alternative?

Options we are considering:

  • SQS → Lambda → ECS RunTask Lambda would receive the SQS body and call ECS RunTask with container overrides.

  • SQS → EventBridge Pipes → Step Functions Express → ECS RunTask Step Functions would receive the SQS payload and pass it into ECS using Value.$ in container overrides.

  • Backend writes the job payload to MongoDB/S3, SQS only triggers ECS, and the worker fetches the job data itself.

  • Always-running ECS worker service polling SQS directly, but we prefer to avoid idle cost if possible.

Main goals:

  • No long HTTP request.
  • Fast job startup.
  • One isolated ECS/Fargate task per RAG job.
  • Support parallel jobs.
  • Reliable payload passing.
  • Minimal idle cost.
  • Production-safe/recommended AWS architecture.

What would AWS recommend for this use case?

1 Answer
0

Hello.

I believe the EventBridge pipe architecture you're considering is a valid approach, and the problem lies not in the architecture itself, but in the method of payload delivery.
Try setting the environment variable "SQS_BODY" and then reading the SQS body using a command.

  "Overrides": {
    "ContainerOverrides": [
      {
        "Name": "hoge",
        "command" = [
              "python", "-m", "app.application.jobs.rag_worker", "--mode", "single", "--payload", "\"$SQS_BODY\""
         ],
        "Environment": [
          { "Name": "SQS_BODY",  "Value": "$.body" }
        ]
      }
    ]
  }

According to the following document, the entire value needs to be a JSON path.
https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-pipes-event-target.html

EventBridge Pipes target parameters support optional dynamic JSON path syntax. You can use this syntax to specify JSON paths instead of static values (for example $.detail.state). The entire value has to be a JSON path, not only part of it. For example, RedshiftParameters.Sql can be $.detail.state but it can't be "SELECT * FROM $.detail.state". These paths are replaced dynamically at runtime with data from the event payload itself at the specified path. Dynamic path parameters can't reference new or transformed values resulting from input transformation. The supported syntax for dynamic parameter JSON paths is the same as when transforming input. For more information, see Amazon EventBridge Pipes input transformation.

The following answers may also be helpful.
https://repost.aws/questions/QU_WC7301mT8qR7ip_9cyjdQ/eventbridge-pipes-and-ecs-task#ANc3UIsDDnR9-dcUX-P_jq3A

EXPERT
answered 22 days ago
EXPERT
reviewed 21 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.