Can multiple glue crawlers run concurrently?

0

Customer reports that Crawler is single threaded, only one can run at a time. Is it correct?

What's the best practice to setup crawler? customer has 10s of dataset that need to be crawled frequently? how could they run multiple one concurrently to finish schema detection or data change detection quickly?

Thanks.

MODÉRATEUR
demandé il y a 5 ans1698 vues
1 réponse
0
Réponse acceptée

Each individual Crawler is single threaded, you can't execute the same Crawler more than one at a time. You can have up to 50 crawlers per account as the default limit, and I've had multiple unique crawlers execute at the same time.

https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_glue

Crawler setup details are documented here https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html How you choose to configure your crawler will depend on what the customer is trying to achieve.

Another thing to note is, you can have a single crawler crawl multiple input data stores. If you had many separate data stores, but they all needed crawling at the same time/frequency, you can have glue either combine them into a single schema (in some circumstances), or multiple schemas (if they're unique).

AWS
répondu il y a 5 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions