Testing Crawlers with Glue 3.0 Docker image

0

So I have an ETL pipeline that read S3 parquet files, transforms and inserts into DynamoDB. I have a crawler defined which reads the parquet and populates a Glue Data Catalog table. The PySpark job then queries the catalog table and populates Dynamo. I would like to test this locally using the aws-glue-libs Docker image. Unfortunately I cannot seem to query a local Data Catalog or interact with crawlers via the Docker container. The Docker image has virtually no documentation outside of a 2 year old blog. Can I develop and test a full Glue workflow via this container or is the purpose just to run a PySpark job? Thanks!

TLDR: Is it possible to use the boto3 Glue API against a local Docker endpoint to perform crawler/catalog operations, or is the Docker image just a glorified PySpark install?

preguntada hace 2 años615 visualizaciones
1 Respuesta
1

Hi,

As you said the Glue Docker image includes Pyspark libraries along Glue ones (DynamicFrames, GlueContext) but no more. There is no support for crawlers or Catalog right now.

If you are in a Docker environment you can try Local Stack.

Bests.

profile pictureAWS
respondido hace 2 años
AWS
EXPERTO
revisado hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas