Testing Crawlers with Glue 3.0 Docker image

0

So I have an ETL pipeline that read S3 parquet files, transforms and inserts into DynamoDB. I have a crawler defined which reads the parquet and populates a Glue Data Catalog table. The PySpark job then queries the catalog table and populates Dynamo. I would like to test this locally using the aws-glue-libs Docker image. Unfortunately I cannot seem to query a local Data Catalog or interact with crawlers via the Docker container. The Docker image has virtually no documentation outside of a 2 year old blog. Can I develop and test a full Glue workflow via this container or is the purpose just to run a PySpark job? Thanks!

TLDR: Is it possible to use the boto3 Glue API against a local Docker endpoint to perform crawler/catalog operations, or is the Docker image just a glorified PySpark install?

gefragt vor 2 Jahren615 Aufrufe
1 Antwort
1

Hi,

As you said the Glue Docker image includes Pyspark libraries along Glue ones (DynamicFrames, GlueContext) but no more. There is no support for crawlers or Catalog right now.

If you are in a Docker environment you can try Local Stack.

Bests.

profile pictureAWS
beantwortet vor 2 Jahren
AWS
EXPERTE
überprüft vor 2 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen