Skip to content

Move data from S3 bucket to S3 table bucket

0

Hi Team,

I wanted to check if we can move data from normal S3 bucket to S3 table bucket and then query that data using athena. I have my data files in parquet format in S3 bucket and want to move those files to S3 table so as to get benefit with S3table bucket performance

  • You can move data to S3 table bucket using AWS CLI. Note that, S3 table bucket is available in limited regions, hopefully you are operating in the supported region.

asked 8 months ago1.2K views
1 Answer
0

Yes, you can move data from a normal S3 bucket to an S3 table bucket and then query that data using Athena. This process is known as migrating tabular data from general-purpose S3 buckets to S3 Tables. Since your data is already in Parquet format, this will make the migration process smoother.

To migrate your data:

  1. Create a table bucket: Use the Amazon S3 console to create a new table bucket specifically designed for storing structured data in Parquet format.

  2. Set up the migration process: You can use AWS services like AWS CloudFormation to deploy necessary resources such as AWS Lambda, IAM roles, Step Functions, and PySpark scripts to facilitate the migration.

  3. Use Apache Spark for data transfer: Leverage Apache Spark, which can be run on Amazon EMR (Elastic MapReduce) or AWS Glue, to efficiently read the Parquet files from your source S3 bucket and write them to the target S3 table bucket.

  4. Optimize the migration: Ensure your data is properly partitioned and aim for file sizes around 100-250 MB for better performance during processing.

  5. Create tables in the S3 table bucket: After moving the data, create tables as first-class resources directly in the S3 table bucket. These tables will be accessible by applications or tools that support the Apache Iceberg standard.

  6. Set up permissions: Use table-level permissions defined in identity- or resource-based policies to secure access to your tables.

Once the data is migrated to the S3 table bucket, you can query it using Athena. S3 Tables provide purpose-built storage optimized for querying tabular data, which can result in up to 3x faster query performance compared to unmanaged Apache Iceberg tables.

To query your data using Athena:

  1. Open the Athena console.
  2. Write your SQL query, making sure to reference your new S3 table bucket, namespace, and table name.
  3. Run the query to retrieve your data.

By moving your Parquet files to S3 Tables, you'll be able to take advantage of the performance benefits, including faster query performance and higher transactions per second, compared to general-purpose S3 buckets.
Sources
Guidance for Migrating Tabular Data from Amazon S3 to S3 Tables
Load json data from S3 into the new S3 tables | AWS re:Post
Tabular Data Storage At Scale - Amazon S3 Tables - AWS
Tutorial: Getting started with S3 Tables - Amazon Simple Storage Service

Along with the above, you can also check the below references

https://repost.aws/questions/QUwcm91JR3TJio-Kh7QeuIhw/is-there-a-way-to-use-existing-bucket-data-to-load-data-to-an-s3-table

https://aws.amazon.com/solutions/guidance/migrating-tabular-data-from-amazon-s3-to-s3-tables/

answered 8 months ago
AWS
SUPPORT ENGINEER
reviewed 8 months ago
AWS
SUPPORT ENGINEER
revised 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.