Easiest way to add CORS headers to a public AWS OpenData bucket

0

We have a thick-client browser webapp that accesses AWS OpenData files on an AWS S3 bucket that we do not own (but is public, as part of the AWS OpenData program.)

One of the buckets does not have CORS Access-Control-Allow-Origin * headers set, so although the files are public, they effectively aren't public from within a browser.

What is the easiest way to build some proxying that adds CORS headers to the files in this bucket. Note that we would like to NOT duplicate all the files into our own bucket (that would solve the problem, but we're talking terabytes of data). The proxy needs to work with large files, and not substantially affect download bandwidth / performance of the webapp.

The files are large... typically around 100-200Mb, so Amazon API gateway is not an option (30Mb response cap.) We have implemented a version of this with Azure API Management, but it was very slow compared to raw S3 download and VERY expensive.

It is not clear whether it's possible to do this using an AWS S3 Access point pointing to the OpenData bucket (perhaps not, and in any case we don't know the 12-digit AWS account of the bucket owner.)

Is there a better way we should try to do this within AWS? We are aware of general google searches for cors-proxy, and some of our tests with github code have shown very poor throughput.

AlexF
asked 3 months ago209 views
1 Answer
0
Accepted Answer

AWS offers a couple of services that can help you achieve this efficiently: AWS Lambda@Edge and Amazon CloudFront. Create a CloudFront distribution for your S3 bucket. Customize CloudFront with Lambda@Edge:

  • Write and deploy a Lambda function in a supported runtime (like Node.js or Python). The function should add the 'Access-Control-Allow-Origin' and other required headers to the responses from your S3 bucket.
  • Associate this Lambda function with your CloudFront distribution, setting it to trigger on the 'Viewer Response' event. This ensures that the CORS headers are added just before the responses are sent to the client.

Here's a basic example of what the Lambda@Edge function code might look like in Python:

def lambda_handler(event, context):
    # Get the response from the origin
    response = event['Records'][0]['cf']['response']
    headers = response['headers']

    # Set CORS headers
    headers['Access-Control-Allow-Origin'] = [{'key': 'Access-Control-Allow-Origin', 'value': '*'}]
    headers['Access-Control-Allow-Methods'] = [{'key': 'Access-Control-Allow-Methods', 'value': 'GET, HEAD'}]
    headers['Access-Control-Allow-Headers'] = [{'key': 'Access-Control-Allow-Headers', 'value': '*'}]

    # Return the response to viewer
    return response

If this has answered your question or was helpful, accepting the answer would be greatly appreciated. Thank you!

profile picture
EXPERT
answered 3 months ago
profile picture
EXPERT
reviewed a month ago
  • Thanks, will take a look at doing exactly this.

  • You don't need Lambda@Edge. You can do this easily with AWS CloudFront, which supports appending CORS headers.

    There were a bunch of edge cases to do with URLs with query parameters and getting cloudfront to forward them, but that's kinda separate.

    I accepted your answer as correct (with this comment) rather than posting a new answer, because it at least got me moving down the path, which turned out to be a lot shorter than expected. Thank you!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions