Skip to content

AWS Lambda Docker Container: Unable to Establish Communication Between Two Internal Servers over localhost.

1

Hello AWS Community,

I’m encountering an issue where Python function is unable to establish a connection to a Tika server when running inside an AWS Lambda Docker container. Both the Python function and the Tika server are running inside the same Docker container. This setup works perfectly when tested locally, but fails once deployed to AWS Lambda.

Dockerfile:

FROM public.ecr.aws/lambda/python:3.12

# Install OpenJDK 11
RUN microdnf install -y java-11-amazon-corretto && \
    microdnf clean all

# Set JAVA_HOME environment variable
ENV JAVA_HOME /usr/lib/jvm/java-11-amazon-corretto

#Copy the Tika server JAR file into the image
COPY tika-server-1.24.jar ${LAMBDA_TASK_ROOT}

COPY requirements.txt  ${LAMBDA_TASK_ROOT}

USER root
# Install Python dependencies
RUN pip install -r requirements.txt

# Copy your Lambda function code
COPY app.py  ${LAMBDA_TASK_ROOT}

EXPOSE 9998

# Set the CMD to your handler
CMD ["app.handler"]

lambda_handler.py

import requests
import base64
import subprocess

TIKA_URL = 'http://localhost:9998/tika/form'  # Tika server URL

def handler(event, context):

    # Start the Tika server   
    subprocess.Popen(['java', '-jar', '/var/task/tika-server-1.24.jar', "org.apache.tika.server.core.TikaServerCli", '-h', '0.0.0.0', '-p', '9998'])

    import time
    time.sleep(5)  # Sleep for 5 seconds to ensure Tika server is up
    
    file_content = base64.b64decode(event['body'])
    file_path = '/tmp/uploaded_file'
    
    with open(file_path, 'wb') as file:
        file.write(file_content)

    # Send the file to the Tika server
    with open(file_path, 'rb') as f:
            files = {'file': ('uploaded_file', f, 'application/octet-stream')}
            headers = {
                'Accept': 'text/plain'
            }
            tika_response = requests.post(TIKA_URL, files=files, headers=headers)

    return {
        'statusCode': tika_response.status_code,
        'body': tika_response.text
    } 

Error

{
  "errorMessage": "HTTPConnectionPool(host='localhost', port=9998): Max retries exceeded with url: /tika/form (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffffac618d40>: Failed to establish a new connection: [Errno 111] Connection refused'))",
  "errorType": "ConnectionError",
  "requestId": "ff2ca5c8-8f8e-4217-942e-71e331912088",
  "stackTrace": [
    "  File \"/var/task/app.py\", line 28, in handler\n    tika_response = requests.post(TIKA_URL, files=files, headers=headers)\n",
    "  File \"/var/lang/lib/python3.12/site-packages/requests/api.py\", line 115, in post\n    return request(\"post\", url, data=data, json=json, **kwargs)\n",
    "  File \"/var/lang/lib/python3.12/site-packages/requests/api.py\", line 59, in request\n    return session.request(method=method, url=url, **kwargs)\n",
    "  File \"/var/lang/lib/python3.12/site-packages/requests/sessions.py\", line 589, in request\n    resp = self.send(prep, **send_kwargs)\n",
    "  File \"/var/lang/lib/python3.12/site-packages/requests/sessions.py\", line 703, in send\n    r = adapter.send(request, **kwargs)\n",
    "  File \"/var/lang/lib/python3.12/site-packages/requests/adapters.py\", line 700, in send\n    raise ConnectionError(e, request=request)\n"
  ]
}
  • Hi, Based on my understanding of dockerfile, lambda code. You are trying to start a server process for every invocation. Lambda is not designed to be used this way, it is for executing serverless code. I suggest you try to use the jar file in command line mode from the lambda function. Also, check for a possibility to build a lambda layer. Thanks!

1 Answer
0

Hi Vinitha,
Thanks for sharing your detailed question and setup! Your problem is intriguing, and it's clear this has been challenging. Let’s work through it step by step to ensure you have a clear solution. 🚀


Clarifying the Issue

You are trying to run a Python AWS Lambda function that starts a Tika server using a subprocess within a Docker container. While the setup works locally, once deployed to AWS Lambda, your Python function cannot communicate with the Tika server running on localhost:9998. The error you’re encountering—Connection refused—suggests that the Tika server process isn't accessible as expected. Additionally, Lambda's ephemeral and stateless nature might be contributing to the problem since each invocation essentially starts with a fresh container.


Key Terms

  • AWS Lambda: A serverless compute service where code runs in response to events without provisioning or managing servers.
  • Docker Container: A lightweight, portable environment for running applications, including on AWS Lambda.
  • localhost: A network address used to refer to the same system the application is running on, limited to the container's internal network.
  • Tika Server: A server-side library for file and document parsing, running in a Java environment.
  • Lambda Container Lifecycle: Containers in AWS Lambda are initialized once and reused for subsequent invocations when possible, but initialization processes need to happen before the first invocation.
  • Cold Start: The initialization phase for a new Lambda container, which can delay execution and affect performance.

The Solution (Our Recipe)

Steps at a Glance:

  1. Modify the Dockerfile to prestart the Tika server process.
  2. Use ENTRYPOINT to ensure Tika is always running inside the container.
  3. Adjust the Python Lambda handler to interact with the already running Tika server.
  4. Test the Docker container locally, simulating AWS Lambda conditions.
  5. Deploy the updated container to AWS Lambda.
  6. Debug and monitor logs if issues persist.
  7. Consider alternative solutions like offloading Tika to a dedicated microservice.
  8. Acknowledge and mitigate potential pitfalls like cold starts and container reuse.
  9. Add scalability by adopting a microservices architecture (optional).

Step-by-Step Guide:

  1. Modify the Dockerfile to prestart the Tika server process:
    Update the Dockerfile to start the Tika server during the container's initialization. This ensures the Tika server is ready by the time Lambda starts handling requests.
    Updated CMD example:

    CMD ["sh", "-c", "java -jar /var/task/tika-server-1.24.jar -h 0.0.0.0 -p 9998 & /var/runtime/bootstrap"]

    Why it works: Using & starts the Tika process in the background, allowing the Lambda runtime to initialize simultaneously.

  2. Use ENTRYPOINT to ensure Tika is always running inside the container:
    Add an ENTRYPOINT to replace or complement the CMD directive:

    ENTRYPOINT ["sh", "-c", "java -jar /var/task/tika-server-1.24.jar -h 0.0.0.0 -p 9998 & exec /var/runtime/bootstrap"]

    Why it works: The exec command replaces the shell process with the Lambda runtime, ensuring proper process management and shutdown.

  3. Adjust the Python Lambda handler to interact with the already running Tika server:
    Remove the subprocess call and sleep delay from your handler function:

    # These lines are no longer necessary
    # subprocess.Popen(['java', '-jar', '/var/task/tika-server-1.24.jar', '-h', '0.0.0.0', '-p', '9998'])
    # time.sleep(5)

    Why it works: The Tika server will already be running when the Lambda handler executes.

  4. Test the Docker container locally, simulating AWS Lambda conditions:
    Use Docker to run the container and test locally:

    docker build -t tika-lambda .
    docker run -p 9998:9998 -e AWS_LAMBDA_EVENT_FILE=event.json tika-lambda

    Create a sample event.json file:

    {
        "body": "BASE64_ENCODED_CONTENT"
    }

    Tip: Use the AWS SAM CLI to test locally in a simulated Lambda environment:

    sam local invoke -e event.json
  5. Deploy the updated container to AWS Lambda:
    Push the Docker image to Amazon Elastic Container Registry (ECR):

    aws ecr get-login-password --region YOUR_REGION | docker login --username AWS --password-stdin YOUR_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com
    docker tag tika-lambda:latest YOUR_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/tika-lambda:latest
    docker push YOUR_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/tika-lambda:latest

    Update your Lambda function to use the new image in the AWS Management Console or via CLI.

  6. Debug and monitor logs if issues persist:
    Use local Docker logs and CloudWatch to identify runtime issues:

    docker logs CONTAINER_ID

    In AWS, enable detailed logging to diagnose invocation issues.

  7. Consider alternative solutions like offloading Tika to a dedicated microservice:
    Why this might be better: Running Tika as a microservice decouples it from Lambda, allowing for better resource allocation and scaling. Use AWS ECS or Fargate to deploy the Tika server, then update the Lambda function to interact with it via an HTTP endpoint.

  8. Acknowledge and mitigate potential pitfalls like cold starts and container reuse:

    • Cold Starts: Using a prebuilt container image reduces initialization time.
    • Container Reuse: Take advantage of container reuse by initializing Tika during the container's startup phase rather than per invocation.
  9. Add scalability by adopting a microservices architecture:
    Architecture Walkthrough:

    • Deploy Tika Server on AWS ECS or AWS Fargate for high availability and scalability.
    • Use an AWS ALB (Application Load Balancer) to expose the Tika server as an HTTP endpoint.
    • Update your Lambda function to send requests to the Tika server via the ALB endpoint.
      Example high-level architecture:
    • Lambda FunctionALB EndpointTika Server on ECS
      Benefits:
    • Better resource management.
    • Independent scaling of Tika and Lambda.
    • Decoupled architecture for future flexibility.

Closing Thoughts

The primary challenge stems from AWS Lambda's stateless, event-driven model, which isn't designed for starting and managing server processes per invocation. By ensuring the Tika server runs at container startup and exploring alternative architectures, you can resolve the "Connection refused" issue and optimize performance and scalability. The microservices architecture ensures a future-proof, production-ready setup.

If you encounter additional challenges, feel free to update your question or comment. The community is here to help! 😊


I hope this guidance resolves your issue, Vinitha! Let us know how it goes, and feel free to share any follow-up questions or observations. Happy building! ⚡


Cheers, Aaron 😊

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.