Can multiple glue crawlers run concurrently?

0

Customer reports that Crawler is single threaded, only one can run at a time. Is it correct?

What's the best practice to setup crawler? customer has 10s of dataset that need to be crawled frequently? how could they run multiple one concurrently to finish schema detection or data change detection quickly?

Thanks.

审核人员
已提问 5 年前1698 查看次数
1 回答
0
已接受的回答

Each individual Crawler is single threaded, you can't execute the same Crawler more than one at a time. You can have up to 50 crawlers per account as the default limit, and I've had multiple unique crawlers execute at the same time.

https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_glue

Crawler setup details are documented here https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html How you choose to configure your crawler will depend on what the customer is trying to achieve.

Another thing to note is, you can have a single crawler crawl multiple input data stores. If you had many separate data stores, but they all needed crawling at the same time/frequency, you can have glue either combine them into a single schema (in some circumstances), or multiple schemas (if they're unique).

AWS
已回答 5 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则