Adaptive Capacity and Time Series Data

0

I have time series data stored in a table where the primary key is a UUID and the sort key is a date. Only data in the recent months are accessed frequently and all older data are rarely accessed. According to https://aws.amazon.com/blogs/database/design-patterns-for-high-volume-time-series-data-in-amazon-dynamodb/, I should split my table into two tables where one has more capacity for the frequently accessed recent data and the other has less capacity for the rarely accessed older data to save cost by utilizing capacity efficiently. However, after learning about adaptive capacity (https://aws.amazon.com/about-aws/whats-new/2019/05/amazon-dynamodb-adaptive-capacity-is-now-instant/ and https://www.youtube.com/watch?v=zUsJK5pe_A0), I am wondering if splitting the original table into two tables would still be ideal for cost/capacity.

IIUC, DynamoDB tries to group items with the same primary key into the same partition. If most of my UUIDs have many months of history, my partitions would have a lot of rarely accessed data grouped together. In that case, it sounds like I would have a lot of wasted cost/capacity regardless of adaptive capacity. Therefore, I should still split my original table into two tables. Is that correct?

yajaws
asked 4 years ago278 views
3 Answers
0

Thanks - we will follow-up with one of our developer evangelists on the best practice here.

answered 4 years ago
0

Hi @yajaws,

Get the following info from Alex (blog author):

You are correct, some of the issues mentioned in the article are much less critical since the introduction of adaptive capacity, and especially since it became "instant" last year in May: https://aws.amazon.com/about-aws/whats-new/2019/05/amazon-dynamodb-adaptive-capacity-is-now-instant/.

Also, it's worth mentioning that if you are using on-demand pricing you don't even have to worry about capacity at all, so the overhead of splitting into multiple tables may not be worth it.

But, if you are using provisioned capacity you'll need to estimate RCU's and WCU's. In high-throughput scenarios you may still benefit from multiple tables as you don't want writes to be throttled and at the same time you don't want to over-provision reads.

A "hybrid" solution could be enabling on-demand pricing only on older tables whose data is read rarely, and keeping provisioned capacity on the current table where you'll have sustained high-throughput (writes). That should allow you to optimize for cost.

I hope this is useful.

Alex

answered 4 years ago
0

Thank you very much Arturo and Alex. I am marking your response as helpful and the question as answered. Have a great day.

Edited by: yajaws on Mar 9, 2020 5:17 PM

Edited by: yajaws on Mar 9, 2020 5:18 PM

yajaws
answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions