Disappointing speed querying Timestream with billions of records

0

When we started out with Timestream we thought it would be a good choice to hold the billions of sensor data records we gather each year. The choice to use Timestream was largely based on the marketing texts on the website, which state that you can "analyze trillions of events per day" and that "when your data grows, the performance stays the same". We have tried different approaches and followed Timestream best practices stated here:

https://docs.aws.amazon.com/timestream/latest/developerguide/best-practices.html https://docs.aws.amazon.com/timestream/latest/developerguide/queries-bp.html

Last year when we started with Timestream user defined partitioning wasn't available yet, so the partioning took place on measure_name. We have tried our own partitioning scheme, which improved scanning speed, only returning data is still taking really long.

We are reaching a point where we don't want to spend anymore time and resources on Timestream. Timestream doesn't seem mature enough as a product. We have tried to contact AWS support but getting help is taking more then 2 weeks now.

At the moment, we are looking for alternatives to Timestream. We are looking for a database that can handle billions of records and that can be queried in a performant way. We are not looking for a time series database per se, but we are looking for a database that can handle time series data. We are currently looking at InfluxDB and TimescaleDB and considering MemoryDB with a Redis backend or even DynamoDB. We are wondering if anyone has experience with these databases or other databases that can handle billions of records and that can be queried in a performant way. We are also wondering if anyone has experience with Timestream at our scale and can share their experience with us.

Thanks in advance!

Yours sincerely,

Luis

已提問 5 個月前檢視次數 151 次
1 個回答
0

Hi Luis,

The basis for the marketing claims is the service architecture as explained here - https://www.allthingsdistributed.com/2021/06/amazon-timestream-time-series-is-the-new-black.html. Timestream stores the data in partitions called 'tiles' and, irrespective of the number of partitions in a table, the query engine, if provided has the right pruning parameters, can scan the right partition to provide the query result.

If the query result is large in size, you have to paginate through the result. Is that the time consuming part of your execution?

Kindly share your urgency on the support ticket for a faster response.

AWS
已回答 5 個月前
profile picture
專家
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南