Adaptive Capacity and Time Series Data

0

I have time series data stored in a table where the primary key is a UUID and the sort key is a date. Only data in the recent months are accessed frequently and all older data are rarely accessed. According to https://aws.amazon.com/blogs/database/design-patterns-for-high-volume-time-series-data-in-amazon-dynamodb/, I should split my table into two tables where one has more capacity for the frequently accessed recent data and the other has less capacity for the rarely accessed older data to save cost by utilizing capacity efficiently. However, after learning about adaptive capacity (https://aws.amazon.com/about-aws/whats-new/2019/05/amazon-dynamodb-adaptive-capacity-is-now-instant/ and https://www.youtube.com/watch?v=zUsJK5pe_A0), I am wondering if splitting the original table into two tables would still be ideal for cost/capacity.

IIUC, DynamoDB tries to group items with the same primary key into the same partition. If most of my UUIDs have many months of history, my partitions would have a lot of rarely accessed data grouped together. In that case, it sounds like I would have a lot of wasted cost/capacity regardless of adaptive capacity. Therefore, I should still split my original table into two tables. Is that correct?

yajaws
질문됨 4년 전287회 조회
3개 답변
0

Thanks - we will follow-up with one of our developer evangelists on the best practice here.

답변함 4년 전
0

Hi @yajaws,

Get the following info from Alex (blog author):

You are correct, some of the issues mentioned in the article are much less critical since the introduction of adaptive capacity, and especially since it became "instant" last year in May: https://aws.amazon.com/about-aws/whats-new/2019/05/amazon-dynamodb-adaptive-capacity-is-now-instant/.

Also, it's worth mentioning that if you are using on-demand pricing you don't even have to worry about capacity at all, so the overhead of splitting into multiple tables may not be worth it.

But, if you are using provisioned capacity you'll need to estimate RCU's and WCU's. In high-throughput scenarios you may still benefit from multiple tables as you don't want writes to be throttled and at the same time you don't want to over-provision reads.

A "hybrid" solution could be enabling on-demand pricing only on older tables whose data is read rarely, and keeping provisioned capacity on the current table where you'll have sustained high-throughput (writes). That should allow you to optimize for cost.

I hope this is useful.

Alex

답변함 4년 전
0

Thank you very much Arturo and Alex. I am marking your response as helpful and the question as answered. Have a great day.

Edited by: yajaws on Mar 9, 2020 5:17 PM

Edited by: yajaws on Mar 9, 2020 5:18 PM

yajaws
답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠