I want to upload data in bulk to my Amazon DynamoDB table.
Resolution
To upload data to DynamoDB in bulk, use one of the following options.
BatchWriteItem
To issue multiple PutItem calls simultaneously, use the BatchWriteItem API operation. You can also use parallel processes or threads in your code to issue multiple parallel BatchWriteItem API calls. This makes the data load faster.
AWS Data Pipeline
If the data is in Amazon Simple Storage Service (Amazon S3), then you can use AWS Data Pipeline to export to DynamoDB. Data Pipeline automates the process of creating an Amazon EMR cluster and exporting your data from Amazon S3 to DynamoDB in parallel BatchWriteItem requests. When you use Data Pipeline, you don't have to write the code for the parallel transfer. For more information, see Importing data from Amazon S3 to DynamoDB.
Import Table feature
If the data is stored in Amazon S3, then you can upload the data to a new DynamoDB table using the Import Table feature. This feature supports CSV, DynamoDB JSON, or Amazon ION format in either compressed (GZIP or ZSTD) or uncompressed format. For more information, see DynamoDB data import from Amazon S3: how it works.
Amazon EMR
To upload data to DynamoDB with Amazon EMR and Apache Hive, complete the following steps:
1. Create an EMR cluster:
For Release, choose emr-5.30.0 or later.
For Applications, choose an option that includes Hive.
2. Create an external Hive table that points to the Amazon S3 location for your data.
3. Create another external Hive table, and point it to the DynamoDB table.
4. Use the INSERT OVERWRITE command to write data from Amazon S3 to DynamoDB. For more information, see Importing data to DynamoDB.
AWS Database Migration Service (AWS DMS)
You can use AWS DMS to export data from a relational database to a DynamoDB table. For more information, see Using an Amazon DynamoDB database as a target for AWS Database Migration Service.
AWS Glue
If you exported your upload data to Amazon S3 from a different DynamoDB table using the DynamoDB export feature, then use AWS Glue. This option is efficient for uploading large datasets. This is because the export feature uses the DynamoDB backup functionality, and it doesn't scan the source table. AWS Glue doesn't impact the performance or availability of the source table. For more information, see Using AWS Glue with Amazon DynamoDB as source and sink.