Skip to content

Crawler cannot be started. Verify the permissions in the policies attached to the IAM role defined in the crawler.

1

Hi there, I am trying to run Glue Crawler against a JDBC data source, which is actually an RDS PostgreSQL instance in the same account as my Glue crawler.

My crawler won't start - it fails with the following error reported in CloudWatch:

Crawler cannot be started. Verify the permissions in the policies attached to the IAM role defined in the crawler.

I have given the crawler the IAM role of AWSGlueServiceRoleDefault, which has the following permission policies:

AmazonRDSFullAccess
AmazonS3FullAccess
AWSGlueServiceRole
AdministratorAccess
AWSGlueConsoleFullAccess

The role's trusted entities are as follows:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

I fully expected the crawler to work just with AmazonS3FullAccess and AWSGlueServiceRole permissions - adding AdministratorAccess is obviously a desperate measure, but even that is not working, which makes me think the error being reported in CloudWatch by the crawler is not the actual problem.

I've set up the Data Connection already, and the RDS instance it points to is publicly accessible, and also known to be working via other access methods.

Has anyone seen this problem before, or know what else I might check?

2 Answers
4

The resolution for this problem was to configure VPC, Subnet and Security groups for the Connector, which I had not done.

I was mislead by two things:

  1. The 'Create connection' wizard which describes Network options as optional ("If your AWS Glue job needs to run on Amazon Elastic Compute Cloud (EC2) instances in a virtual private cloud (VPC) subnet").
  2. The error message above which suggests the problem is with IAM role permissions.

The page that unlocked things for me was here.

Testing an AWS Glue connection
https://docs.aws.amazon.com/glue/latest/dg/console-test-connections.html

To others who encounter this problem, I would suggest testing your AWS Glue connection and making sure that works before worrying about what AWS Crawler is telling you.

answered 3 years ago
0

I just ran into this issue myself and the error message posted here that shows up on the crawler status page was incredibly unhelpful. In the end it did turn out to be a networking issue but even after I had tested the connection, made sure I had the passRole IAM permissions and anything else required in the policy, proper VPC and subnet config on the connection, and self-referencing security group rules this error persisted and the crawler would fail after a couple of seconds without much showing up in the CloudWatch log or console (or CloudTrail for that matter but maybe I was looking in the wrong spot?).

What finally lead me in the right direction was to use the Visual ETL designer tool and try the Data preview out after dropping my Source node on the canvas. Once I tried to start a preview session it finally gave me a useful message saying that the S3 VPC endpoint could not be reached/found. I had not created that because my connection did not directly rely on S3 at all (it was a postgres connection to an RDS instance within this account and VPC so basically just internal jdbc same as the OP). The visual designer had some extra clues that glue startup references a ton of publicly available s3 artifacts and so there is always a dependence on S3. After I created the S3 Gateway VPC endpoint everything started to work just fine.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.