Disappointing speed querying Timestream with billions of records

Question

When we started out with Timestream we thought it would be a good choice to hold the billions of sensor data records we gather each year. The choice to use Timestream was largely based on the marketing texts on the website, which state that you can "analyze trillions of events per day" and that "when your data grows, the performance stays the same".  We have tried different approaches and followed Timestream best practices stated here:

https://docs.aws.amazon.com/timestream/latest/developerguide/best-practices.html
https://docs.aws.amazon.com/timestream/latest/developerguide/queries-bp.html

Last year when we started with Timestream user defined partitioning wasn't available yet, so the partioning took place on measure_name. We have tried our own partitioning scheme, which improved scanning speed, only returning data is still taking really long.

We are reaching a point where we don't want to spend anymore time and resources on Timestream. Timestream doesn't seem mature enough as a product. We have tried to contact AWS support but getting help is taking more then 2 weeks now.

At the moment, we are looking for alternatives to Timestream. We are looking for a database that can handle billions of records and that can be queried in a performant way. We are not looking for a time series database per se, but we are looking for a database that can handle time series data. We are currently looking at InfluxDB and TimescaleDB and considering MemoryDB with a Redis backend or even DynamoDB. We are wondering if anyone has experience with these databases or other databases that can handle billions of records and that can be queried in a performant way. We are also wondering if anyone has experience with Timestream at our scale and can share their experience with us.

Thanks in advance!

Yours sincerely,

Luis

Answer

Hi Luis,

The basis for the marketing claims is the service architecture as explained here - https://www.allthingsdistributed.com/2021/06/amazon-timestream-time-series-is-the-new-black.html. Timestream stores the data in partitions called 'tiles' and, irrespective of the number of partitions in a table, the query engine, if provided has the right pruning parameters, can scan the right partition to provide the query result.

If the query result is large in size, you have to paginate through the result. Is that the time consuming part of your execution?

Kindly share your urgency on the support ticket for a faster response.

Disappointing speed querying Timestream with billions of records

관련 콘텐츠