I want to back up my Amazon DynamoDB table using Amazon Simple Storage Service (Amazon S3).
Short description
DynamoDB offers two built-in backup methods:
Both of these methods are suitable for backing up your tables for disaster recovery purposes. However, with these methods, you can't use the data for use cases involving data analysis or extract, transform, and load (ETL) jobs. The DynamoDB Export to S3 feature is the easiest way to create backups that you can download locally or use with another AWS service. To customize the process of creating backups, you can use Amazon EMR or AWS Glue.
Resolution
DynamoDB Export to S3 feature
Using this feature, you can export data from an Amazon DynamoDB table anytime within your point-in-time recovery window to an Amazon S3 bucket. For more information, see DynamoDB data export to Amazon S3.
For an example of how to use this feature, see Export Amazon DynamoDB table data to your data lake in Amazon S3, no code writing required.
Using the Export to S3 Feature allows you to use your data in other ways including the following:
- Perform ETL against the exported data on S3, and then import the data back to DynamoDB
- Retain historical snapshots for auditing
- Integrate the data with other services or applications
- Build an S3 data lake from the DynamoDB data, and then analyze the data from various services, such as Amazon Athena, Amazon Redshift, or Amazon SageMaker
- Run as-needed queries on your data from Athena or Amazon EMR without affecting your DynamoDB capacity
Note the following pros and cons when using this feature:
- Pros: This feature allows you to export data across AWS Regions and accounts without building custom applications or writing code. The exports don't affect the read capacity or the availability of your production tables.
- Cons: This feature exports the table data in DynamoDB JSON or Amazon Ion format only. To reimport the data natively with an S3 bucket, see DynamoDB data import from Amazon S3. You can also create a new template or use AWS Glue, Amazon EMR, or the AWS SDK to reimport the data.
Amazon EMR
Use Amazon EMR to export your data to an S3 bucket. You can do so with either of these methods:
Note the following pros and cons when using these methods:
- Pros: If you're an active Amazon EMR user and are comfortable with Hive or Spark, then you can manage your configurations better with these methods than with the native Export to S3 function. You can also use existing clusters for this purpose.
- Cons: These methods require you to create and maintain an EMR Cluster. If you use DynamoDBStorageHandler, then you must be familiar with Hive or Spark.
AWS Glue
Use AWS Glue to copy your table to Amazon S3. For more information, see Using AWS Glue and Amazon DynamoDB export.
- Pros: Because AWS Glue is a serverless service, you don't need to create and maintain resources. You can directly write back to DynamoDB. You can add custom ETL logic for use cases, such as filtering and converting, when exporting data. You can also choose your preferred format from CSV, JSON, Parquet, or ORC. For more information, see Data format options for inputs and outputs in AWS Glue.
- Cons: If you choose this option, you must know how to use Spark. You also must maintain the source code for your AWS Glue ETL job. For more information, see "connectionType": "dynamodb".
If none of these options offer the flexibility that you need, then you can use the DynamoDB API to create your own solution.
Related information
Requesting a table export in DynamoDB
How to export an Amazon DynamoDB table to Amazon S3 using AWS Step Functions and AWS Glue