I have created a Glue job where I copy a table from a RDS database (MySQL) into S3. I have crawled the RDS database and I reference the table defined in the Glue Data Catalog in the job.
When writing to s3 by using glueContext.write_dynamic_frame.from_options I receive above error. When looking into the error logs I see that I got the following exceptions: java.sql.SQLException, com.mysql.cj.exceptions.WrongArgumentException and java.lang.IllegalArgumentException.
When searching the web I found that the issue is that a 0-year date exists in one of the entries in the database (see: https://bugs.mysql.com/bug.php?id=94872).
When removing these records the error was not raised and the job completed successfully. Since the MySQL database allows these records I did not expect this issue when writing to S3. Can this be considered as a bug or should we update the database records? What values in the database are allowed?
See below the Python script:
import datetime
import sys
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
args = getResolvedOptions(sys.argv, ["JOB_NAME", "table_name", "s3_bucket"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
date = datetime.datetime.now()
# Script generated for node JDBC Connection
JDBCConnection_node1 = glueContext.create_dynamic_frame.from_catalog(
database="glue_rds",
table_name=args["table_name"],
transformation_ctx="JDBCConnection_node1",
)
# Script generated for node S3 bucket
S3bucket_node2 = glueContext.write_dynamic_frame.from_options(
frame=JDBCConnection_node1,
connection_type="s3",
format="parquet",
connection_options={
"path": f"s3://{args['s3_bucket']}/data/platform_table_extract/{args['table_name']}/{date.strftime('year=%Y/month=%m/day=%d/hour=%H/minute=%M')}",
"partitionKeys": [],
},
transformation_ctx="S3bucket_node2",
)
job.commit()
An actual Date wouldn't trigger that error (because it cannot have an invalid date), that likely means some column in the database is a date but not in your data
Hi @Gonzalo Herreros, thank you for all the support you bring to the community, especially us newbies. I don't understand what you mean. To clarify, I can add that I've verified that there are 0000-00-00 values in date columns in the db (MySQL 5.7). I tried converting to text, dropping the columns, but if it is like you suggested, that is's reading from the MySQL JDBC driver that is failing, what can I do?
I'm just trying to copy source data to a staging area in S3 in raw format. Then I plan to clean and transform.
I remember vaguely there is an option on the MySQL JDBC driver to tell it how to handle such invalid dates