Disappointing speed querying Timestream with billions of records

0

When we started out with Timestream we thought it would be a good choice to hold the billions of sensor data records we gather each year. The choice to use Timestream was largely based on the marketing texts on the website, which state that you can "analyze trillions of events per day" and that "when your data grows, the performance stays the same". We have tried different approaches and followed Timestream best practices stated here:

https://docs.aws.amazon.com/timestream/latest/developerguide/best-practices.html https://docs.aws.amazon.com/timestream/latest/developerguide/queries-bp.html

Last year when we started with Timestream user defined partitioning wasn't available yet, so the partioning took place on measure_name. We have tried our own partitioning scheme, which improved scanning speed, only returning data is still taking really long.

We are reaching a point where we don't want to spend anymore time and resources on Timestream. Timestream doesn't seem mature enough as a product. We have tried to contact AWS support but getting help is taking more then 2 weeks now.

At the moment, we are looking for alternatives to Timestream. We are looking for a database that can handle billions of records and that can be queried in a performant way. We are not looking for a time series database per se, but we are looking for a database that can handle time series data. We are currently looking at InfluxDB and TimescaleDB and considering MemoryDB with a Redis backend or even DynamoDB. We are wondering if anyone has experience with these databases or other databases that can handle billions of records and that can be queried in a performant way. We are also wondering if anyone has experience with Timestream at our scale and can share their experience with us.

Thanks in advance!

Yours sincerely,

Luis

asked 4 months ago143 views
1 Answer
0

Hi Luis,

The basis for the marketing claims is the service architecture as explained here - https://www.allthingsdistributed.com/2021/06/amazon-timestream-time-series-is-the-new-black.html. Timestream stores the data in partitions called 'tiles' and, irrespective of the number of partitions in a table, the query engine, if provided has the right pruning parameters, can scan the right partition to provide the query result.

If the query result is large in size, you have to paginate through the result. Is that the time consuming part of your execution?

Kindly share your urgency on the support ticket for a faster response.

AWS
answered 4 months ago
profile picture
EXPERT
reviewed 22 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions