- Newest
- Most votes
- Most comments
Hello rePost-User-2300507
To my understanding it seems you are having issues regarding executing SQL queries on SAP HANA. If I have misunderstood the issue please feel free to correct me.
Glue Dynamic frames do not include SQL functionality however Spark Data frames do support this functionality. I have provided steps that should allow you to run SQL queries on the SAP HANA data and write to a location following the information in [1].
- Below is a sample script that uses the CData JDBC driver with the PySpark and AWSGlue modules to extract SAP HANA data into a spark Data Frame. This step I believe was already done, "successfully connected to SAP HANA database using Apache Spark Load function from AWS Glue job." If your SAP HANA data is successfully in a dataframe ignore this step and proceed to step 2.
#Use the CData JDBC driver to read SAP HANA data from the example_table table into a DataFrame and connect to the source#
source_df = glueContext.read .format("jdbc").option("driver", jdbc_driver_name).option("url", db_url).option("dbtable", example_table).option("user", db_username).option("password", db_password).load()
- Register a tempview using the spark dataframe in the above command. Then execute the sql query on the dataframe using the "glueContext.sql" function.
source_df.createOrReplaceTempView("source_df")
queried_source_df = glueContext.sql('{SQL query}')
#example query#
queried_source_df = glueContext.sql("""SELECT * FROM source_df LIMIT 10""")
- From here you can write the queried table to a location or convert to a Glue Dynamic Frame
Write to location
queried_source_df.write.format('{orc/parquet/whatever}').partitionBy("{columns}").save('path to s3 location')
Convert to Dynamic Frame
source_dynamicFrame = DynamicFrame.fromDF(queried_source_df, glueContext, "name")
If there are any more issues please do not hesitate to reach out.
References: [1] https://aws.amazon.com/blogs/awsforsap/extracting-data-from-sap-hana-using-aws-glue-and-jdbc/
Thank you for your reply. I need to update database table directly.(by INSERT, UPDATE or DELETE statement) Unfortunately, this is not the solution I need.
It seems like you have JayDeBeApi==1.2.3
correctly mentioned, your import jaydebeapi
statement is either missing or the import statement needs a folder name where the libraries are installed in the Glue cluster.
import jaydebeapi
conn = jaydebeapi.connect("Driver", ...
Instead of trying to JayDeBeApi, Spark connections to the database would help with using the underlyng Spark power to use parallel computing.
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
#initializing Glue and spark context
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
#fomring the jdbc url with db credentials from secret manager
jdbc_url = "jdbc:sap://xXXXXXX:30015/?databaseName=mydb&user=SYSTEM&password=xxxxx"
#setting options for connecting to HANA via JDBC and SSL
hana_table = spark.read.format("jdbc") \
.option("driver", "com.sap.db.jdbc.Driver") \
.option("url", jdbc_url) \
.option("dbtable", "SYSTEM.TEST1") \
.load()
Reference:
Relevant content
- asked 3 years ago
- asked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
The problem was resolved by using SAP HANA Python interface instead of JayDeBeApi.