Using AWS Glue to export ~500TB of DynamoDB table to S3 bucket

0

We have use case where we want to export ~500TB of DynamoDb data to a S3, one of the possible approaches that I found was making use of AWS Glue Job. Also while exporting the data to S3, we need to perform certain kind of transformation in the DynamoDB data for which we need to make a service call to Java package (transformation logic isn't that heavy and should get completed in ms). Is AWS glue a better approach to export ~500TB of data

shasnk
질문됨 4달 전283회 조회
2개 답변
0

You can use AWS Batch for exporting and transforming teh 500TB of data from DynamoDB to an S3 bucket.

  • Start by using the native export functionality of DynamoDB to export your data directly to an S3 bucket. This approach is highly efficient for large datasets and does not impact the performance of your DynamoDB table.
  • Develop a Docker container with your transformation logic and upload it to Amazon ECR. Then, configure an AWS Batch environment specifying the necessary compute resources. Then, define job definitions in AWS Batch, detailing how jobs should run using your container. Last, submit transformation jobs to AWS Batch to process data from S3 and store the transformed data back to S3 or another location.
  • Optionally, use AWS Step Functions to manage the workflow, particularly if the process involves multiple steps.

If this has resolved your issue or was helpful, accepting the answer would be greatly appreciated. Thank you!

profile picture
전문가
답변함 4달 전
0

Hi,

Yes, Glue is the ETL service on AWS for such tasks: it allows to process / transform data as you export from DDB to S3.

Here is a good article detailling how to do it: https://dev.to/ritaly/how-to-export-aws-dynamodb-data-to-s3-for-recurring-tasks-4l47

Best,

Didier

profile pictureAWS
전문가
답변함 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠