By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Athena federated query with a Glue table fails to process timestamp/date formats

0

I am using Athena federated query with a Glue table to query IOT Data stored in Document DB. The time format reported by IOT sensors is ISO 8601 (here the entry from DocumentDB "time" : "2024-11-20T00:17:23",). When setting up the Glue table for a "time" column, either with data type "date" or "timestamp" the Athena federated query fails with below error: GENERIC_USER_ERROR: Encountered an exception[java.lang.RuntimeException] from your LambdaFunction[arn:aws:lambda:eu-central-1:746952355359:function:eu-central-mongo-cat] executed in context[S3SpillLocation{bucket='aws-athena-query-results-eu-central-1-746952355359', key='athena-spill/783ea4a5-78b3-49f1-adc8-ef50ded8f0ea/e55ac47c-09e2-4e90-921f-f0ef60938214', directory=true}] with message[Error while processing field time]

Data type "date": Data type "timestamp":

Only in case I define the "time" column in Glue table as a string, the time data can be read as text. Data type "string":

Since I need to process data with Quicksight, the time/date field is recognized as text only and it is useless for displaying any time-granularity related analytics. The default Glue time format is yyyy-mm-dd HH:MM:SS. IoT sensors report time data as "time" : "2024-11-20T00:17:23". Would there be a quick way to overcome this issue? For the Quicksight analysis I need explicitly either "date" or "timestamp" format

AWS
asked 17 days ago35 views
1 Answer
0

The issue you're experiencing with Athena federated query failing to process timestamp/date formats from your IoT data stored in DocumentDB is likely due to a mismatch between the format of the timestamp in your data and the expected format in Athena.

To overcome this issue, you have a few options:

  1. Use a custom SerDe (Serializer/Deserializer): You can create a custom SerDe that can handle the ISO 8601 format used by your IoT sensors. This would allow Athena to correctly interpret the timestamp format.

  2. Transform the data: You could use AWS Glue ETL jobs to transform the data before querying it with Athena. In the Glue job, you can convert the ISO 8601 timestamp to a format that Athena can understand natively.

  3. Use Athena's built-in functions: If you keep the "time" column as a string in your Glue table, you can use Athena's date and time functions to parse and convert the string to a timestamp or date format during your query. For example:

    SELECT parse_datetime(time, 'yyyy-MM-dd''T''HH:mm:ss') AS parsed_time
    FROM your_table

    This approach allows you to work with the data as a proper timestamp in your queries and should be compatible with QuickSight's time-based analytics.

  4. Adjust the Glue table definition: In your Glue table definition, you might be able to specify the exact format of your timestamp. Try setting the "time" column to type "timestamp" with format "yyyy-MM-dd'T'HH:mm:ss".

For the QuickSight analysis, option 3 or 4 would likely be the most straightforward. You can create a view in Athena that includes the parsed timestamp, and then use this view as the data source for QuickSight. This way, QuickSight will see the "time" field as a proper timestamp and enable all time-granularity related analytics.

Remember to test your solution thoroughly, as the exact behavior may depend on the specific versions of Athena, Glue, and the federated query connector you're using.

Sources
Community | What happens when you run a query in Amazon Athena?
Community | What happens when you run a query in Amazon Athena?
Use Amazon Athena Federated Query - Amazon Athena
Community | Unleashing Data Analytics on S3 Data lake with AWS Glue Crawler and Amazon Athena

profile picture
answered 17 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions