Testing Crawlers with Glue 3.0 Docker image

0

So I have an ETL pipeline that read S3 parquet files, transforms and inserts into DynamoDB. I have a crawler defined which reads the parquet and populates a Glue Data Catalog table. The PySpark job then queries the catalog table and populates Dynamo. I would like to test this locally using the aws-glue-libs Docker image. Unfortunately I cannot seem to query a local Data Catalog or interact with crawlers via the Docker container. The Docker image has virtually no documentation outside of a 2 year old blog. Can I develop and test a full Glue workflow via this container or is the purpose just to run a PySpark job? Thanks!

TLDR: Is it possible to use the boto3 Glue API against a local Docker endpoint to perform crawler/catalog operations, or is the Docker image just a glorified PySpark install?

feita há 2 anos615 visualizações
1 Resposta
1

Hi,

As you said the Glue Docker image includes Pyspark libraries along Glue ones (DynamicFrames, GlueContext) but no more. There is no support for crawlers or Catalog right now.

If you are in a Docker environment you can try Local Stack.

Bests.

profile pictureAWS
respondido há 2 anos
AWS
ESPECIALISTA
avaliado há 2 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas