Django App in ECS Container Cannot Connect to S3 in Gov Cloud

0

I have a container running in an EC2 instance on ECS. The container is hosting a django based application that utilizes S3 and RDS for its file storage and db needs respectively. I have appropriately configured my VPC, Subnets, VPC endpoints, Internet Gateway, roles, security groups, and other parameters such that I am able to host the site, connect to the RDS instance, and I can even access the site.

The issue is with the connection to S3. When I try to run the command python manage.py collectstatic --no-input which should upload/update any new/modified files to S3 as part of the application set up the program hangs and will not continue. No files are transferred to the already set up S3 bucket.

Details of the set up:

All of the below is hosted on AWS Gov Cloud

VPC and Subnets

  • 1 VPC located in Gov Cloud East with 2 availability zones (AZ) and one private and public subnet in each AZ (4 total subnets)
  • The 3 default routing tables (1 for each private subnet, and 1 for the two public subnets together)
  • DNS hostnames and DNS resolution are both enabled

VPC Endpoints

All endpoints have the "vpce-sg" security group attached and are associated to the above vpc

  • s3 gateway endpoint (set up to use the two private subnet routing tables)
  • ecr-api interface endpoint
  • ecr-dkr interface endpoint
  • ecs-agetn interface endpoint
  • ecs interface endpoint
  • ecs-telemetry interface endpoint
  • logs interface endpoint
  • rds interface endpoint

Security Groups

  • Elastic Load Balancer Security Group (elb-sg)

    • Used for the elastic load balancer
    • Only allows inbound traffic from my local IP
    • No outbound restrictions
  • ECS Security Group (ecs-sg)

    • Used for the EC2 instance in ECS
    • Allows all traffic from the elb-sg
    • Allows http:80, https:443 from vpce-sg for s3
    • Allows postgresql:5432 from vpce-sg for rds
    • No outbound restrictions
  • VPC Endpoints Security Group (vpce-sg)

    • Used for all vpc endpoints
    • Allows http:80, https:443 from ecs-sg for s3
    • Allows postgresql:5432 from ecs-sg for rds
    • No outbound restrictions

Elastic Load Balancer

  • Set up to use an Amazon Certificate https connection with a domain managed by GoDaddy since Gov Cloud route53 does not allow public hosted zones
  • Listener on http permanently redirects to https

Roles

  • ecsInstanceRole (Used for the EC2 instance on ECS)
    • Attached policies: AmazonS3FullAccess, AmazonEC2ContainerServiceforEC2Role, AmazonRDSFullAccess
    • Trust relationships: ec2.amazonaws.com
  • ecsTaskExecutionRole (Used for executionRole in task definition)
    • Attached policies: AmazonECSTaskExecutionRolePolicy
    • Trust relationships: ec2.amazonaws.com, ecs-tasks.amazonaws.com
  • ecsRunTaskRole (Used for taskRole in task definition)
    • Attached policies: AmazonS3FullAccess, CloudWatchLogsFullAccess, AmazonRDSFullAccess
    • Trust relationships: ec2.amazonaws.com, ecs-tasks.amazonaws.com

S3 Bucket

  • Standard bucket set up in the same Gov Cloud region as everything else

Trouble Shooting

If I bypass the connection to s3 the application successfully launches and I can connect to the website, but since static files are supposed to be hosted on s3 there is less formatting and images are missing.

Using a bastion instance I was able to ssh into the EC2 instance running the container and successfully test my connection to s3 from there using aws s3 ls s3://BUCKET_NAME

If I connect to a shell within the application container itself and I try to connect to the bucket using...

s3 = boto3.resource('s3')
bucket = s3.Bucket(BUCKET_NAME)
s3.meta.client.head_bucket(Bucket=bucket.name)

I receive a timeout error...

File "/.venv/lib/python3.9/site-packages/urllib3/connection.py", line 179, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f3da4467190>, 'Connection to BUCKET_NAME.s3.amazonaws.com timed out. (connect timeout=60)')
...
File "/.venv/lib/python3.9/site-packages/botocore/httpsession.py", line 418, in send
    raise ConnectTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://BUCKET_NAME.s3.amazonaws.com/"

Based on this article I think this may have something to do with the fact that I am using the GoDaddy DNS servers which may be preventing proper URL resolution for S3.

If you're using the Amazon DNS servers, you must enable both DNS hostnames and DNS resolution for your VPC. If you're using your own DNS server, ensure that requests to Amazon S3 resolve correctly to the IP addresses maintained by AWS.

I am unsure of how to ensure that requests to Amazon S3 resolve correctly to the IP address maintained by AWS. Perhaps I need to set up another private DNS on route53?

I have tried a very similar set up for this application in AWS non-Gov Cloud using route53 public DNS instead of GoDaddy and there is no issue connecting to S3.

Please let me know if there is any other information I can provide to help.

jrita
已提問 2 年前檢視次數 937 次
3 個答案
0
已接受的答案

The issue lies within how boto3 handles different aws regions. This may be unique to usage on AWS GovCloud. Originally I did not have a region configured for S3, but according to the docs an optional environment variable named AWS_S3_REGION_NAME can be set.

AWS_S3_REGION_NAME (optional: default is None) Name of the AWS S3 region to use (eg. eu-west-1)

I reached this conclusion thanks to a stackoverflow answer I was using to try to manually connect to s3 via boto3. I noticed that they included an argument for region_name when creating the session, which alerted me to make sure I had appropriately set the region in my app.settings and environment variables.

If anyone has some background on why this needs to be set for GovCloud functionality but apparently not for commercial, I would be interested to know.

I also had to specify the AWS_S3_SIGNATURE_VERSION in app.settings so boto3 knew to use version 4 of the signature. According to the docs

As of boto3 version 1.13.21 the default signature version used for generating presigned urls is still v2. To be able to access your s3 objects in all regions through presigned urls, explicitly set this to s3v4. Set this to use an alternate version such as s3. Note that only certain regions support the legacy s3 (also known as v2) version.

Some additional information in this stackoverflow response details that new S3 regions deployed after January 2014 will only support signature version 4. AWS docs notice

Apparently GovCloud is in this group of newly deployed regions.

If you do not specify this calls to the s3 bucket for static files, such as js scripts, during operation of the web application will receiving a 400 response. S3 responds with the error message

<Code>InvalidRequest</Code>
<Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message>
<RequestId>#########</RequestId>
<HostId>##########</HostId>
</Error>```
jrita
已回答 2 年前
0

What kind of Amazon ECS task networking is setup?

profile pictureAWS
專家
kentrad
已回答 2 年前
  • I am using a "bridge" network, because I have two containers running on the same EC2 instance. One is a "proxy" container running nginx and the other is the "app" container using uwsgi and django. The two containers communicate via a "link" over the bridge network. I am open to other approaches.

  • On the container, what does the S3 endpoint resolve to?

  • It resolves to https://BUCKET_NAME.s3.amazonaws.com

    Executing wget https://BUCKET_NAME.s3.amazonaws.com on the container shell

    Output Connecting to BUCKET_NAME.s3.amazonaws.com (52.217.110.6:443) then the prompt times out...

0

There are a couple of things in the configuration that don't look correct but should not cause the problem you are seeing. For instance, VPC Gateway Endpoints (S3) do not have Security Groups. Also, you have inbound rules in your security groups for S3 and RDS. These can be eliminated. The VPC Interface Endpoint for RDS is for the service calls (443) not database queries (5432). At this point, I think I would resort to a process of elimination, starting with the ESC SG rules.

profile pictureAWS
專家
kentrad
已回答 2 年前
  • So you are proposing I change the ECS security group to...

    Inbound: Allows all traffic from the elb-sg Outbound: No restrictions

    I can also eliminate the RDS VPC Endpoint since I am not making service calls.

    Is that correct?

  • I tried the above changes without any change in behavior.

    I also tried to use the aws cli to connect to s3 from the container (not just the instance running the containers). I am able to successfully execute 'aws s3 ls s3://BUCKET_NAME'. The issue seems to be isolated to the use of boto3 within the django application. I have confirmed that my access key ID and secret access key are set properly in the django application.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南