Need help understanding Quicksight direct query to timestream with filters

0

Do filters in direct mode mean that the raw underlying SQL query is adding or modifying a where clause? OR does it sorta scan everything then filter it after the underlying data has been scanned?

I have multiple quicksight reports and analyses all using datasets that are cached in spice and from what I understand this means that all the queries are running off a cached disconnected dataset thats a copy of the actual underlying data and that copy is refreshed periodically. this works and its awesome.

I also have a timestream table thats large in the region of terabytes of data. This dataset is obviously too large to import into spice, which means I must directly query the dataset. I've imported the dataset and have set everything up, but when viewing data my timestream costs over the last few days have shot up immensely. I normally sit around 2-3 usd a day, and on the day I released my report it jumped to 90$.

If I look in the costing breakdown this is due to the amount of "Scanned bytes" in timestream. All of my reports use filters to segment and breakdown the data but this obviously isn't working the way it should. if my datasource is essentially "select * from table", and then i add a filter on the dataset does that mean the query sent to the datasource is "select * from table where column = filter" OR does it mean it loads all the rows and then does some other filtering after that? based off the speed of reports I assume its the first one, but if so then I need to figure out how to constrain the filters even more to load less data.

i have disabled the report for now and my timestream costs have once again dropped down to normal levels, and at this point i'm too scared to re-enable it, but people are clamoring for their data :/

1 Answer
1
Accepted Answer

In QuickSight, filters in Direct Query mode are generally implemented as WHERE clauses in the SQL query that is sent to the underlying data source. This means that the filters are applied to the data source before the data is returned to QuickSight, which can result in improved query performance and reduced data transfer costs. However, the exact implementation may depend on the specific data source and query engine being used.

In your case, it sounds like the filters may not be reducing the amount of data that is being scanned in Timestream, which is leading to higher query costs. To optimize your query performance and reduce costs, you may want to consider applying filters that are more selective or using partitioning to reduce the amount of data that needs to be scanned. You could also consider using aggregations or precomputing summary data to reduce the amount of data that needs to be scanned and transferred.

Additionally, it's worth noting that using Direct Query mode with large data sources like Timestream can be challenging due to the high cost of data transfer and query execution. You may want to consider using SPICE for pre-aggregation and caching of frequently accessed data, or exploring other data storage options like Amazon Redshift or Amazon Athena that are better suited for large-scale data analysis

AWS_Guy
answered a year ago
profile picture
EXPERT
reviewed 3 months ago
  • thanks so much for your response. if the filters are applied to the datasource before data is returned then this is exactly what i would expect.

    the implementation and query engine is the built in timestream data source.

    in general we do use precalculated aggregations - this is something im looking at implementing here. i would love to find some subset to cache inside of spice. thanks for your answer

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions