Running PySpark Jobs Locally with the AWS Glue ETL library - On windows

0

I have followed the steps outlined to install Developing using the AWS Glue ETL library - Python on Windows found here: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html

After following the installation instructions, it's unclear how to actually execute a spark job successfully locally In powershell I have:

a simple pyspark job called my_script.py:

from pyspark.context import SparkContext
from pyspark.sql import SparkSession

sc = SparkContext()
spark = SparkSession(sc)

I have:

  1. navigated to the aws-glue-libs directory
  2. in PowerShell attempted.\bin\gluesparksubmit F:\programming\my_script.py
  3. the output seems to be nothing

Can you please provide correct example on how to execute a aws glue job locally?

The ultimate goal here is to develop my glue jobs locally in Pycharm before deploying to the AWS Glue Service.

Adriano
asked 10 months ago447 views
1 Answer
0

Those scripts are for Linux Bash, not PowerShell.
It might be possible to get then work using a Cygwin shell but it might be easier if you use the docker option

profile pictureAWS
EXPERT
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions