I have a setup where I am using Athena to query data exported from DynamoDB table. I am using DDB export to S3 capability as mentioned in this blog post.
The table that I am querying contains some rows which are first Gzipped and then base64 encoded before writing to DDB. I am looking for some best practices on how can I make this data available for query in Athena?
Two options that I can think of are
- Transforming file in S3 once they are exported to base64 decode and uncompress the data.
- Write an Athena UDF to do the same during query execution.
Any recommendation on what might be the best option for such use-case? Is there something in built in Glue transformation that I can take advantage of?
from_base64() will give me the zipped binary. I also need ability to unzip the data. Any built-in function to do that?
I see, apologies, i thought you have data in gz file and one of columns is base64 encoded. there's no built-in function in Athena that would extract the data in this case. If you plan to query that data multiple times i suggest to process that data before you query it with Athena.