Disappointing speed querying Timestream with billions of records

0

When we started out with Timestream we thought it would be a good choice to hold the billions of sensor data records we gather each year. The choice to use Timestream was largely based on the marketing texts on the website, which state that you can "analyze trillions of events per day" and that "when your data grows, the performance stays the same". We have tried different approaches and followed Timestream best practices stated here:

https://docs.aws.amazon.com/timestream/latest/developerguide/best-practices.html https://docs.aws.amazon.com/timestream/latest/developerguide/queries-bp.html

Last year when we started with Timestream user defined partitioning wasn't available yet, so the partioning took place on measure_name. We have tried our own partitioning scheme, which improved scanning speed, only returning data is still taking really long.

We are reaching a point where we don't want to spend anymore time and resources on Timestream. Timestream doesn't seem mature enough as a product. We have tried to contact AWS support but getting help is taking more then 2 weeks now.

At the moment, we are looking for alternatives to Timestream. We are looking for a database that can handle billions of records and that can be queried in a performant way. We are not looking for a time series database per se, but we are looking for a database that can handle time series data. We are currently looking at InfluxDB and TimescaleDB and considering MemoryDB with a Redis backend or even DynamoDB. We are wondering if anyone has experience with these databases or other databases that can handle billions of records and that can be queried in a performant way. We are also wondering if anyone has experience with Timestream at our scale and can share their experience with us.

Thanks in advance!

Yours sincerely,

Luis

질문됨 5달 전151회 조회
1개 답변
0

Hi Luis,

The basis for the marketing claims is the service architecture as explained here - https://www.allthingsdistributed.com/2021/06/amazon-timestream-time-series-is-the-new-black.html. Timestream stores the data in partitions called 'tiles' and, irrespective of the number of partitions in a table, the query engine, if provided has the right pruning parameters, can scan the right partition to provide the query result.

If the query result is large in size, you have to paginate through the result. Is that the time consuming part of your execution?

Kindly share your urgency on the support ticket for a faster response.

AWS
답변함 5달 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠