call a stored procedure from within glue job

0

Hi team,

I want to call my stored procedure at the end of my glue job (inside the glue job itself),

can I connect to MySQL RDS DB and call this stored procedure from my pyspark glue job without using py4j?

I don't find examples on the net (this one speak about using py4j: https://stackoverflow.com/questions/64775753/how-to-run-arbitrary-ddl-sql-statements-or-stored-procedures-using-aws-glue)

any code example would be helpful. Thanks a lot

  • if your glue ETL jobs needs multiple workers and the stored procedure it is not immediate, you might be wasting resources. It might be better to run the Stored procedure from a Glue Python Shell job that gets executed as soon as your Glue Spark job is completed in a workflow.

1 Antwort
0
Akzeptierte Antwort

I used pymysql and worked for me.

this is an example of the code I used :

import sys
import pymysql

........

rds_host  = "database-id.rds.amazonaws.com"
name = "userName"
password = "secretPassword"
db_name = "myDb"
conn = pymysql.connect(host=rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)

with conn.cursor() as cur:
   query="CALL mystoredProcedure()"
   cur.execute(query)
   conn.commit()
   cur.close()
Jess
beantwortet vor 2 Jahren
AWS
EXPERTE
überprüft vor 2 Jahren
  • @Jess , Is there any security group or VPC setting required for this ? I am doing the same thing but the connection is failing. My DB is under a VPC

  • Hi, as pointed in the above comment, I am also facing RDS instance connectivity issues when using the above solution. My RDS instance is in a VPC and I have configured inbound rules for the security group but still cannot connect. Below is the exact error message:

    OperationalError: (20009, b'DB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (gluetest.cgj1ict7xclx.ap-southeast-1.rds.amazonaws.com)\nNet-Lib error during Connection timed out (110)\nDB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (gluetest.cgj1ict7xclx.ap-southeast-1.rds.amazonaws.com)\nNet-Lib error during Connection timed out (110)\n')

  • you need to put jdbc:mysql://

    jdbc:mysql://you_db_cluster_endpoint:3306/yourDb_name?verifyServerCertificate=false&useSSL=true

    maybe try this :

    import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from botocore.exceptions import ClientError import boto3 from awsglue.dynamicframe import DynamicFrame import pyspark.sql.functions as f

    glue_connection_name = args['GLUE_CONNECTION_NAME'] // your glue connection database_name = args['DB_NAME']

    print('Getting details for connection ' + glue_connection_name) source_jdbc_conf = glueContext.extract_jdbc_conf(glue_connection_name)

    from py4j.java_gateway import java_import java_import(sc._gateway.jvm,"java.sql.Connection") java_import(sc._gateway.jvm,"java.sql.DatabaseMetaData") java_import(sc._gateway.jvm,"java.sql.DriverManager") java_import(sc._gateway.jvm,"java.sql.SQLException")

    conn = sc._gateway.jvm.DriverManager.getConnection(source_jdbc_conf.get('url') + '/' + database_name, source_jdbc_conf.get('user'), source_jdbc_conf.get('password')) print('Connected to ' + conn.getMetaData().getDatabaseProductName() + ', ' + source_jdbc_conf.get('url'))

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen