- Newest
- Most votes
- Most comments
Thoughts about DynamoDB
Per GB, DynamoDB is around 5X more cost per GB of data stored. On top of that, you have RCU/WCU cost.
I would recommend keeping data in S3. Not only is it more cost effective, but with S3, you do not have to worry about RCU/WCU cost or throughput of DynamoDB.
SageMaker notebooks and training instances can read directly from S3, and S3 has high-throughput. I don't think you will have a problem with 100 MB datasets.
If you need to prep/transform your data, you can do the transformations "in place" in S3 using Glue, Athena, Glue DataBrew, GlueStudio, etc.
Glue and DynamoDB
I thought about Glue but wanted to put data in DynamoDB where Glue does not seem to support.
Glue supports both Python and Spark jobs. If you use a Glue Python job, you can import the boto3 (AWS SDK) library and write to DynamoDB.
Other strategies
How is your customer ingesting the sensor data / how is it being written to S3? Are they using AWS IoT Core?
Regardless, the pattern you've described thus far is:
Device -> Sensor data in S3 -> Transform with Lambda -> store data in DynamoDB
An alternative approach you could consider is using Kinesis Firehose with Lambda transformations. This will allow you to do "in-line" parsing / transformation of your data before it is ever written to S3, this removing the need to re-read the data from S3 and apply transformations after the fact. Firehose also allows you to write the stored data in formats such as Parquet, which can help with cost and subsequent query performance.
If you want to store both raw data and transformed data, you can use a "fanout" pattern with Kinesis Streams/Firehose, where one output is raw data to S3 and the other is a transformed stream.
Relevant content
- asked 2 months ago
- asked 2 years ago
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 2 years ago