By using AWS re:Post, you agree to the Terms of Use

Athena how to query different table versions.

0

I have a workflow that retrieves data and stores it on S3 bucket (Database snapshot). For the first time that the workflow ran, it created one folder for each table. So for example let's say I had table products, then it created a folder for that table. Now inside of this folder there will be a subfolder called version_0. Every time the workflow runs again, it will fetch the data, and create a version_1, version_2, so forth.

On Athena, when I query this table, it will give me data from the latest version created. How can I access for example version_0 through the query? Following this documentation , I tried to run the following:

select count(*) from products FOR SYSTEM_VERSION AS OF 0

But I'm getting the following error: Enter image description here

To be honest don't even know if following this documentation that I sent above is the way to go or if there is some other way.

I also looked into the Table properties going into AWS Glue -> Tables -> products table and then I see that there is a field TableVersion. I can click on the "Compare Versions" button and it will give me a side by side between version 0 and version 1 (the current versions I have for now).

Thanks

  • The version parameter is the bigint snapshot ID associated with a governed table version.

1 Answer
0

Hello,  

Firstly, Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. If you are seeing only the data from the latest version in Athen that means you only have the latest version of data in the underlying S3 folder. As per your description it seems that the new versions is overwriting the old version in S3.

Secondly, if you are able to compare versions in Glue that means that a new version of table is getting added to glue catalog each time data is unloaded into S3.

Lastly, if you are looking for an ability to query all the versions of your table snapshot. You have to unload and store the data of each new version as a new folder under products (example version_1, version_2 etc ) and add the same to glue catalog. You can consider using glue crawler [1] for adding the new partitions to the table.

————————

Reference:

[1] https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html

================  

Have a nice day!

SUPPORT ENGINEER
answered 2 months ago
  • Hi Arun,

    Thanks for answering. I did what you mentioned, (create a crawler to add the new partitions to the table). But the result was just 2 different tables (1 table for each version I guess), and each table starts with a name "part_00000_c5e9..." . Also the link you sent is quite general, could you please specify the documentation that actually aggregates the different partitions/versions into one table?

    Thank you

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions