Is there a way to execute SQL on HANA using AWS Glue?

0

I want to execute SQL(INSERT, UPDATE or DELETE) statement on SAP HANA using AWS Glue job. I have tried by using JayDeBeApi, but an error(below) occurred.

"ModuleNotFoundError: No module named 'jaydebeapi'"

Settings:

  • Job: AWS Glue Studio script-only mode
  • Job details:
    • Connections: JDBC connection to SAP HANA (test connection is success)
    • Libraries: Dependent JARs path: URI of SAP HANA driver on S3
    • Job parameters: "--additional-python-modules": "JayDeBeApi==1.2.3, JPype1==1.4.0"

I have created jobs that successfully connected to Oracle database using JayDeBeApi or that successfully connected to SAP HANA database using Apache Spark Load function, from AWS Glue job.

How can I execute INSERT, UPDATE or DELETE statement on SAP HANA database using AWS Glue job?

Thank you.

  • The problem was resolved by using SAP HANA Python interface instead of JayDeBeApi.

asked 2 years ago770 views
2 Answers
0

Hello rePost-User-2300507

To my understanding it seems you are having issues regarding executing SQL queries on SAP HANA. If I have misunderstood the issue please feel free to correct me.

Glue Dynamic frames do not include SQL functionality however Spark Data frames do support this functionality. I have provided steps that should allow you to run SQL queries on the SAP HANA data and write to a location following the information in [1].

  1. Below is a sample script that uses the CData JDBC driver with the PySpark and AWSGlue modules to extract SAP HANA data into a spark Data Frame. This step I believe was already done, "successfully connected to SAP HANA database using Apache Spark Load function from AWS Glue job." If your SAP HANA data is successfully in a dataframe ignore this step and proceed to step 2.

#Use the CData JDBC driver to read SAP HANA data from the example_table table into a DataFrame and connect to the source#

source_df = glueContext.read .format("jdbc").option("driver", jdbc_driver_name).option("url", db_url).option("dbtable", example_table).option("user", db_username).option("password", db_password).load()
  1. Register a tempview using the spark dataframe in the above command. Then execute the sql query on the dataframe using the "glueContext.sql" function.
source_df.createOrReplaceTempView("source_df")
queried_source_df = glueContext.sql('{SQL query}')

#example query#

queried_source_df = glueContext.sql("""SELECT * FROM source_df LIMIT 10""")
  1. From here you can write the queried table to a location or convert to a Glue Dynamic Frame

Write to location

queried_source_df.write.format('{orc/parquet/whatever}').partitionBy("{columns}").save('path to s3 location')

Convert to Dynamic Frame

source_dynamicFrame = DynamicFrame.fromDF(queried_source_df, glueContext, "name")

If there are any more issues please do not hesitate to reach out.

References: [1] https://aws.amazon.com/blogs/awsforsap/extracting-data-from-sap-hana-using-aws-glue-and-jdbc/

AWS
answered 2 years ago
  • Thank you for your reply. I need to update database table directly.(by INSERT, UPDATE or DELETE statement) Unfortunately, this is not the solution I need.

0

It seems like you have JayDeBeApi==1.2.3 correctly mentioned, your import jaydebeapi statement is either missing or the import statement needs a folder name where the libraries are installed in the Glue cluster.

import jaydebeapi
conn = jaydebeapi.connect("Driver", ...

Instead of trying to JayDeBeApi, Spark connections to the database would help with using the underlyng Spark power to use parallel computing.

from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

#initializing Glue and spark context
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

#fomring the jdbc url with db credentials from secret manager
jdbc_url = "jdbc:sap://xXXXXXX:30015/?databaseName=mydb&user=SYSTEM&password=xxxxx"


#setting options for connecting to HANA via JDBC and SSL
hana_table = spark.read.format("jdbc") \
    .option("driver", "com.sap.db.jdbc.Driver") \
    .option("url", jdbc_url) \
    .option("dbtable", "SYSTEM.TEST1") \
    .load()

Reference:

  1. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html
  2. https://pypi.org/project/JayDeBeApi/
profile pictureAWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions