Hi team,
I'm creating an AWS glue job to load data from CSV file on S3 to Aurora MySQL 8 as DB,
I'm using a custom JDBC driver because as I understood glue connection doesn't support MySQL 8.
please is there an example of script how to load data from S3 to RDS (NOT RDS to RDS) ?
I found this helpful link: https://aws.amazon.com/blogs/big-data/building-aws-glue-spark-etl-jobs-by-bringing-your-own-jdbc-drivers-for-amazon-rds/
but it loads from RDS to RDS not sure how to use the same logic to load from S3 to RDS
I'm currently using this code but it doesn't work the glue job stop with this error :
An error occurred while calling o96.pyWriteDynamicFrame. The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: HJsdDCNsdP96DsdT; S3 Extended Request ID: Fvv72asdLoYsdKCUT9UndlsdRosdfgddup+niZem3RP3sXo4Gp0Fsd5H6sd8TrKMysdanEk=; Proxy: null)
code used :
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
spark.read.option("escapeChar","®")
connection_mysql8_options = {
"url": "jdbc:mysql://databhddfd8bb-180ewlrdhdfhi3ew.cluster-cqdhdfhddvbc.region.rds.amazonaws.com:3306/mydb",
"dbtable": "mydbTable",
"user": "root",
"password": "WsdtbasdLjZasdVrsadtgGHDNJasd,
"customJdbcDriverS3Path": "s3://myBucket/mysql-connector-java-8.0.28.jar",
"customJdbcDriverClassName": "com.mysql.cj.jdbc.Driver"}
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "csv_db", table_name = "mytable_csv", transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("id", "string", "id", "string"), ("created", "string", "created", "timestamp"), .....], transformation_ctx = "applymapping1")
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["created", "id", .....], transformation_ctx = "selectfields2")
##datasink5 = glueContext.write_dynamic_frame.from_options(frame = selectfields2, connection_type="mysql", connection_options=connection_mysql8_options, transformation_ctx = "datasink5")
datasink5 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = selectfields2, catalog_connection = "myaccount-rds-edwextract-connection", connection_options = {
"customJdbcDriverS3Path": "s3://myBucket/mysql-connector-java-8.0.21.jar",
"customJdbcDriverClassName": "com.mysql.cj.jdbc.Driver",
"user": "root",
"password": "GJDMTRasdasdassdasd1AtsdasdLasdadasd",
"url": "jdbc:mysql://daafb269d8bb-1asd0ewasfasfew.cluster-cqtsafasf.region.rds.amazonaws.com:3306/mydb",
"connectionType": "mysql",
"dbtable": "mydbTable",
"database": "mydb"}, transformation_ctx = "datasink5")
job.commit()
would like to know what is the correct syntax to load from CSV in S3 to amazon aurora mt SQL 8 via JDBC driver
Thank you!!!