Querying rows containing GZipped and Base64 encoded string in Athena


I have a setup where I am using Athena to query data exported from DynamoDB table. I am using DDB export to S3 capability as mentioned in this blog post.

The table that I am querying contains some rows which are first Gzipped and then base64 encoded before writing to DDB. I am looking for some best practices on how can I make this data available for query in Athena?

Two options that I can think of are

  1. Transforming file in S3 once they are exported to base64 decode and uncompress the data.
  2. Write an Athena UDF to do the same during query execution.

Any recommendation on what might be the best option for such use-case? Is there something in built in Glue transformation that I can take advantage of?

asked 2 years ago828 views
1 Answer

Probably makes sense to try from_base64() built-in function that Athena supports

answered 2 years ago
  • from_base64() will give me the zipped binary. I also need ability to unzip the data. Any built-in function to do that?

  • I see, apologies, i thought you have data in gz file and one of columns is base64 encoded. there's no built-in function in Athena that would extract the data in this case. If you plan to query that data multiple times i suggest to process that data before you query it with Athena.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions