Can multiple glue crawlers run concurrently?

0

Customer reports that Crawler is single threaded, only one can run at a time. Is it correct?

What's the best practice to setup crawler? customer has 10s of dataset that need to be crawled frequently? how could they run multiple one concurrently to finish schema detection or data change detection quickly?

Thanks.

MODERATOR
asked 5 years ago1671 views
1 Answer
0
Accepted Answer

Each individual Crawler is single threaded, you can't execute the same Crawler more than one at a time. You can have up to 50 crawlers per account as the default limit, and I've had multiple unique crawlers execute at the same time.

https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_glue

Crawler setup details are documented here https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html How you choose to configure your crawler will depend on what the customer is trying to achieve.

Another thing to note is, you can have a single crawler crawl multiple input data stores. If you had many separate data stores, but they all needed crawling at the same time/frequency, you can have glue either combine them into a single schema (in some circumstances), or multiple schemas (if they're unique).

AWS
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions