Getting error "DataFormatConversion.SchemaValidationTimeout"

0

I have a Kinesis stream which feeds into a Firehouse stream which outputs ORC to S3.

All this has been working fine for about 2 years until about 2 days ago, when a new Athena table appeared named

"format_conversion_failed (Partitioned)"

And my primary Athena table that I use for my production data STOPPED being populated!

In the "format_conversion_failed (Partitioned)" table I see a bunch of rows with all the same messages:

DataFormatConversion.SchemaValidationTimeout Timed out while retrieving table from Glue. If you have a large number of Glue table versions, please add 'glue:GetTableVersion' permission (recommended) or delete unused table versions. If you do not have a large number of tables in Glue, please contact AWS Support.

I do not have a large number of Glue Tables (14). (2 are newly created by this error problem).

Can somebody point me to how I can resolve this problem?

질문됨 4년 전322회 조회
1개 답변
0

I was able to identify the problem. Leaving it here in case others hit this same error message. It had to do with us running a Glue crawler. This Glue crawler creates a new "version" of the table, even if nothing changed. Eventually you can build up SO many versions (we had 21k) that the Firehouse times out trying to get the schema for that table (apparently the large number of versions causes a problem there). Additionally, there is NO way to check how many versions you have in the AWS Console UI. The only way to see your count is if you "compare" versions, in the "compare" page it will show your version "id" which is just an incrementing integer, so this "id" would indicate your number of versions (assuming you dont delete them, which you can't do either through the UI).

Our fix was to delete them manually, one-by-one, through the AWS CLI (assuming version "id" as an integer between 1 ... 21k (in our case)).

So, lastly, as a warning to others. Don't use the GLUE crawler in an automated way (via scheduling) unless you intend to manually delete old versions. The crawler will not do that for you.

Edited by: Jason Malcolm on Oct 22, 2020 1:41 PM

Edited by: Jason Malcolm on Oct 22, 2020 1:41 PM

답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠