Getting error "DataFormatConversion.SchemaValidationTimeout"

0

I have a Kinesis stream which feeds into a Firehouse stream which outputs ORC to S3.

All this has been working fine for about 2 years until about 2 days ago, when a new Athena table appeared named

"format_conversion_failed (Partitioned)"

And my primary Athena table that I use for my production data STOPPED being populated!

In the "format_conversion_failed (Partitioned)" table I see a bunch of rows with all the same messages:

DataFormatConversion.SchemaValidationTimeout Timed out while retrieving table from Glue. If you have a large number of Glue table versions, please add 'glue:GetTableVersion' permission (recommended) or delete unused table versions. If you do not have a large number of tables in Glue, please contact AWS Support.

I do not have a large number of Glue Tables (14). (2 are newly created by this error problem).

Can somebody point me to how I can resolve this problem?

asked 4 years ago318 views
1 Answer
0

I was able to identify the problem. Leaving it here in case others hit this same error message. It had to do with us running a Glue crawler. This Glue crawler creates a new "version" of the table, even if nothing changed. Eventually you can build up SO many versions (we had 21k) that the Firehouse times out trying to get the schema for that table (apparently the large number of versions causes a problem there). Additionally, there is NO way to check how many versions you have in the AWS Console UI. The only way to see your count is if you "compare" versions, in the "compare" page it will show your version "id" which is just an incrementing integer, so this "id" would indicate your number of versions (assuming you dont delete them, which you can't do either through the UI).

Our fix was to delete them manually, one-by-one, through the AWS CLI (assuming version "id" as an integer between 1 ... 21k (in our case)).

So, lastly, as a warning to others. Don't use the GLUE crawler in an automated way (via scheduling) unless you intend to manually delete old versions. The crawler will not do that for you.

Edited by: Jason Malcolm on Oct 22, 2020 1:41 PM

Edited by: Jason Malcolm on Oct 22, 2020 1:41 PM

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions