- Newest
- Most votes
- Most comments
The issue you're experiencing is related to how AWS Glue crawlers handle DynamoDB tables with dynamic map attributes. When a DynamoDB table has a map attribute with dynamic keys (like your CustomAttributes), the crawler attempts to create a schema that represents all possible keys it encounters. As the number of unique keys grows, the schema becomes increasingly complex, eventually exceeding Glue's constraints.
There are a few approaches you can take to work around this limitation:
-
Use a custom JSON classifier: If you're planning to export the DynamoDB data to S3 for querying with Athena, you could create a custom JSON classifier that handles the nested structure more efficiently.
-
Use the Athena DynamoDB connector: To query DynamoDB directly from Athena, you'll need to use the Athena DynamoDB connector. This connector uses Lambda functions to query data in DynamoDB and may handle complex map structures differently than the Glue crawler.
-
Modify your data model: Consider restructuring your data to use a more consistent schema. Instead of dynamic keys in a map, you could use an array of key-value pairs with a consistent structure.
-
Use relationalize or unnest transformations: If you're processing this data in a Glue ETL job, you can use the
relationalizefunction to convert nested structures into separate tables, or theunnestoption to convert nested fields into top-level objects. -
Export to S3 in a flattened format: You could create a Glue ETL job that exports your DynamoDB data to S3 in a more Athena-friendly format, flattening the dynamic attributes into a more queryable structure.
Each of these approaches has trade-offs in terms of query performance, data freshness, and implementation complexity. The best solution depends on your specific use case and requirements.
Sources
Process JSON files in AWS Glue | AWS re:Post
Getting error while querying DynamoDB in Athena using Glue Crawler | AWS re:Post
HIVE_UNKNOWN_ERROR: java.net.URISyntaxException: Relative path in absolute URI: arn:aws:dynamodb:us-east-1:<account-id>:table/mytable | AWS re:Post
AWS Glue type systems - AWS Glue
