Testing Crawlers with Glue 3.0 Docker image

0

So I have an ETL pipeline that read S3 parquet files, transforms and inserts into DynamoDB. I have a crawler defined which reads the parquet and populates a Glue Data Catalog table. The PySpark job then queries the catalog table and populates Dynamo. I would like to test this locally using the aws-glue-libs Docker image. Unfortunately I cannot seem to query a local Data Catalog or interact with crawlers via the Docker container. The Docker image has virtually no documentation outside of a 2 year old blog. Can I develop and test a full Glue workflow via this container or is the purpose just to run a PySpark job? Thanks!

TLDR: Is it possible to use the boto3 Glue API against a local Docker endpoint to perform crawler/catalog operations, or is the Docker image just a glorified PySpark install?

質問済み 2年前615ビュー
1回答
1

Hi,

As you said the Glue Docker image includes Pyspark libraries along Glue ones (DynamicFrames, GlueContext) but no more. There is no support for crawlers or Catalog right now.

If you are in a Docker environment you can try Local Stack.

Bests.

profile pictureAWS
回答済み 2年前
AWS
エキスパート
レビュー済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ