Skip to content

Glue Crawler Internal Service Error (JDBC, MSSQL)

0

Hi there,

I have a Glue Crawler that is failing with an Internal Service Exception and no other error logs.

Some details:

  • Crawler source: JDBC connection to a SQL Server Database with the wildcard path (%). (I've tried specific schemas and tables too)
  • The Glue Connection has been tested in 2 other settings in Glue and works:
    • test connection from the Glue Connections console using the same IAM role used to run the crawler
    • A Glue Job: from a glue job using the same IAM role as used with the crawler, I'm able to connect, query a table, and log a count of records.
  • The crawler succeeds and populates the Glue Catalog when configured with dummy source to a CSV file on S3 (instead of the JDBC source).
    • I think this rules out Lake Formation blocking writes to the catalog? Regardless I've granted broad permissions to my crawler IAM role in Lake Formation as well. The Glue Database I'm writing to is otherwise empty.
  • The crawler itself runs for about 5 mins before failing. (maybe suggests that it is actually crawling some tables but failing at a later step?)
  • When I run the crawler with a Glue Connection that I know to be misconfigured/broken, the crawl exits with an error indicating that testing the connection failed (so I don't think its an issue with the connection?)

Cloud Watch logs reveal no information (other than the Internal Service Exception). I've also combed through all Cloud Trail Events between the times of the failed crawler run (+ a bit of buffer on either side), and see nothing concering/obvious. I've reviewed these recommendations as well https://repost.aws/knowledge-center/glue-crawler-internal-service-exception.

3 questions (and grateful for any other advice):

  1. Is there anywhere else I can/should be looking for error messages or is there a way to increase the log level for crawler runs?
  2. Is it documented anywhere, what db permissions a SQL Server db user must have on the database for a crawl? e.g. READ access on sys.* tables? Access to the INFORMATION_SCHEMA maybe? Wondering if the user used for our db connection is lacking permissions on the DB (although nothing obvious about my current db user stands out).
  3. Anything else not mentioned here I should be trying?

Many thanks in advance.

1 Answer
-1

Based on your description, there are several potential causes for the "Internal Service Exception" in your Glue Crawler when connecting to SQL Server via JDBC.

Since your connection tests successfully and works in a Glue job, but fails during crawling, this suggests the issue might be related to specific permissions or configurations needed for crawling operations rather than basic connectivity.

Here are some troubleshooting steps and considerations:

  1. For error logs and increased verbosity:
  • Unfortunately, Glue Crawlers don't provide extensive logging options beyond what you're seeing in CloudWatch.
  • You might want to check if there are any throttling-related errors in CloudTrail that could indicate resource constraints.
  1. Regarding SQL Server permissions for crawling:
  • The database user needs SELECT permissions on the tables you want to crawl
  • The user should have permissions to query system tables and metadata views including:
  • INFORMATION_SCHEMA views
  • sys.tables, sys.columns, sys.types, and other system catalog views
  • Permissions to execute sp_tables and sp_columns stored procedures
  • Consider creating a dedicated database user with appropriate read-only permissions specifically for the crawler
  1. Additional troubleshooting steps:
  • Try limiting the crawler scope to a single table or schema instead of using the wildcard path
  • Check if there are any special characters or unsupported data types in your SQL Server tables
  • Verify network configuration, ensuring the security groups, subnets, and VPC settings allow proper connectivity
  • Ensure the IAM role has all necessary permissions for both the connection and writing to the Glue Data Catalog
  • Consider using a custom JDBC driver if you're encountering compatibility issues with the default driver
  • Check if your SQL Server requires specific authentication methods or encryption settings that might be causing issues during the crawling process

Since the crawler runs for about 5 minutes before failing, it's possible that it encounters a specific table or schema element that causes the exception. Try crawling smaller subsets of your database to isolate the problematic area.
Sources
AWS Glue Connection to RDS SQL Server Fails – "Failed status due to failed connection" | AWS re:Post
Troublehsoot AWS Glue connection to a JDBC data source | AWS re:Post
Resolve connection error in AWS Glue | AWS re:Post

answered a year ago
  • other system catalog views which views? are these views and all other required schemas/tables/views/stored procedures fully documented anywhere? Check if there are any special characters or unsupported data types in your SQL Server tables Which data types are unsupported, are these documented anywhere?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.