By using AWS re:Post, you agree to the Terms of Use

Can I use glue interactive sessions with pythonshell?

0

Can I use glue interactive sessions with pythonshell?

In the docs, it hints at choosing the %job_type or is it only available for glueetl. I am looking for a simple way to develop by requiring myself to use a sage maker notebook because it is overkill on half the ETL more of the time. Can I just get the 0.0625 DPU by default with a notebook? And no extras like I get with sagemaker.

1 Answer
0

Hi, Glue Interactive Sessions allows you to install Glue PySpark and Glue Scala kernels to any Jupyter Notebook and Glue Studio Notebook is a new Glue Studio experience powered by Interactive Sessions. As such, they are aimed ad interactive Glue ETL (Spark) development.

As your question is about development with Glue Python Shell jobs, which is the right choice for small data sets, this can be developed with the Glue Studio Python Shell Editor (no DPUs paid before you run the job) or in any Python IDE environment using compatible Python version.

Using the Glue Studio Python Shell editor you can develop and submit the job to run. For the results you can open the logs directly from the Runs tabs.

Alternatively you could use the legacy script editor, that when you run the PythonShell job in the bottom part of the windows it shows you the logs as the job is running.

hope this helps,

EXPERT
answered 8 months ago
  • I understand that first part. My question is if I have an ETL that is pretty small 500 MB a day, for example. I am connecting to the source via data catalog but destination via glue connection to a rds and running simple reporting on the data. Wouldn't Spark be overkill and therefore can just use a python shell job with aws-data-wrangle and calculate reports with panadas loading data again with was-data-wrangler? or is Glue the wrong tool for this small job. Or should I just continue to through small data at spark and let spark handle the data as in being to scale?

    And if I am using a pythonshell job for this what is the best way to develop it. I need to be able to test pushing the data to the destination. if I am doing things locally in my IDE that is fine but there is then testing my loading to the destination which would be nice with the ability to run like interactive sessions. I think.

    Pretty new and the tools are developing with me in AWS. Just trying to know what the most commend and recommended way is to develop.

  • please see my revised answer and let me know if it is clearer. Glue Python Shell is the right choice in your case. you can use the Script editor (legacy or from Glue Studio for development)

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions