Testing Crawlers with Glue 3.0 Docker image

0

So I have an ETL pipeline that read S3 parquet files, transforms and inserts into DynamoDB. I have a crawler defined which reads the parquet and populates a Glue Data Catalog table. The PySpark job then queries the catalog table and populates Dynamo. I would like to test this locally using the aws-glue-libs Docker image. Unfortunately I cannot seem to query a local Data Catalog or interact with crawlers via the Docker container. The Docker image has virtually no documentation outside of a 2 year old blog. Can I develop and test a full Glue workflow via this container or is the purpose just to run a PySpark job? Thanks!

TLDR: Is it possible to use the boto3 Glue API against a local Docker endpoint to perform crawler/catalog operations, or is the Docker image just a glorified PySpark install?

已提问 2 年前615 查看次数
1 回答
1

Hi,

As you said the Glue Docker image includes Pyspark libraries along Glue ones (DynamicFrames, GlueContext) but no more. There is no support for crawlers or Catalog right now.

If you are in a Docker environment you can try Local Stack.

Bests.

profile pictureAWS
已回答 2 年前
AWS
专家
已审核 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则