- Newest
- Most votes
- Most comments
Hello, Here are some general reasons why the 'Internal Service Exception' error can be thrown when running a crawler to crawl Delta tables:
-
When crawling delta files that have the column mapping feature enabled or other features not supported in the current delta lake version. After enabling column mapping feature, the crawler uses a Delta standalone reader version that is not compatible with the current Delta protocol version for Glue. Please note that column mapping works with Delta lake 1.2.0[1] but the Glue crawler runs with Glue version 3.0 which supports Delta lake 1.0 only. Currently, the column mapping feature is not yet supported using glue crawler
-
When a crawler runs, it might encounter changes to your data store that result in a schema or partition that is different from a previous crawl. If you are configuring the crawler on the console, you can choose the following actions to prevent a table schema to change when a crawler runs:
• Ignore the change and don't update the table in the Data Catalog • Update all new and existing partitions with metadata from the table.
Please refer to this link [2] for information on how to prevent the crawler from changing an existing schema.
- Data format. Please ensure that your folder structure is not Inconsistent, and that your data is not removed or empty data from your data source. Example Amazon S3. I suggest also to check the inconsistency between a table partition definition on the Data Catalog and a Hive partition structure in Amazon S3.
- Column name lengths can cause internal error from the crawler. Please ensure that that does not exceed 255 characters and doesn't contain special characters.
- Please double check the s3 path if it does not contains special characters, or encrypted data.
For more tips on resolving 'Internal Service Exception', please refer to [3].
To learn how to create crawlers to crawl Delta Lake tables, please refer to [4].
If you are still encountering issue and want to determine the exact cause, then please raise a case with AWS Premium Support. This will allow us to review the crawler configuration from our end and then we can further troubleshoot the failed run.
References:
[1] How does Delta Lake manage feature compatibility? - https://docs.delta.io/latest/versioning.html
[2] Setting crawler configuration options - How to prevent the crawler from changing an existing schema - https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html#crawler-schema-changes-prevent
[3] https://repost.aws/knowledge-center/glue-crawler-internal-service-exception
[4] https://aws.amazon.com/blogs/big-data/crawl-delta-lake-tables-using-aws-glue-crawlers/
Relevant content
- asked 17 days ago
- asked 2 months ago
- Accepted Answerasked 9 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 2 years ago