1 Answer
- Newest
- Most votes
- Most comments
1
- CREATE TABLE AS SELECT (CTAS) queries are considered DML queries and do incur costs based on the amount of data scanned by the SELECT query.
- The table created as a result of running CTAS query is not a pointer to the original table but a separate copy of it and hence does not involve any recurring costs.
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 11 days ago
okay thanks, about point 2, does that me it does not get updated after it's been created, and remains static from the time it's first created?
AWS-User-5296216, it's not precise to call the table "static". It might change. It might change a lot it you make lots of updates! But it will not be affected in any way at all by things that happen to the original table, which is I think what you're asking about.
Yes, after you create a table as select, it is a separate, unrelated, table to the original. The data for the table is stored separately, in whichever location you chose when creating it (likely an s3 bucket, with a different prefix). Any change to the original table will not propagate to the new table. You're basically creating a brand new data set that's pre-seeded with data. If you want changes to the original data to be seen in the derived table, you could use Glue to load it periodically into the new table.
The answer is correct, it is important to notice that the CTAS is a one time operation that will create a copy of the data in the original table at the time of execution.
If you update or add data to the original table and you NEED to keep the 2 tables in sync , then you will need to schedule an insert statement that reads from the original table ONLY the new data (if possible).
@Fabrizio@AWS Thanks for the info. Inserting at regular intervals would be perfect. However, as the original logs that the primary table reads are txt files, not columnar data, generating a regular insert statement (which would need to read datetime column data) would mean that all of the data has to once again be read, which means much of the cost saving advantage of using a CTAS statement to generate a columnar table is lost. Unless I've misunderstood. I know I could generate a SELECT/INSERT statement using a partition column which includes a date, but if that partition is created from a non-hive style format like that of alb logs (e.g 'OBJECTDIR/2022/01/01'), the regular insert can only be done accurately at most once a day, which is not ideal for monitoring applications. Again, unless I've missed something?