Skip to content

An error occurred while calling o103.pyWriteDynamicFrame. YEAR

0

I have created a Glue job where I copy a table from a RDS database (MySQL) into S3. I have crawled the RDS database and I reference the table defined in the Glue Data Catalog in the job.

When writing to s3 by using glueContext.write_dynamic_frame.from_options I receive above error. When looking into the error logs I see that I got the following exceptions: java.sql.SQLException, com.mysql.cj.exceptions.WrongArgumentException and java.lang.IllegalArgumentException.

When searching the web I found that the issue is that a 0-year date exists in one of the entries in the database (see: https://bugs.mysql.com/bug.php?id=94872).

When removing these records the error was not raised and the job completed successfully. Since the MySQL database allows these records I did not expect this issue when writing to S3. Can this be considered as a bug or should we update the database records? What values in the database are allowed?

See below the Python script:

import datetime
import sys

from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext

args = getResolvedOptions(sys.argv, ["JOB_NAME", "table_name", "s3_bucket"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

date = datetime.datetime.now()

# Script generated for node JDBC Connection
JDBCConnection_node1 = glueContext.create_dynamic_frame.from_catalog(
    database="glue_rds",
    table_name=args["table_name"],
    transformation_ctx="JDBCConnection_node1",
)

# Script generated for node S3 bucket
S3bucket_node2 = glueContext.write_dynamic_frame.from_options(
    frame=JDBCConnection_node1,
    connection_type="s3",
    format="parquet",
    connection_options={
        "path": f"s3://{args['s3_bucket']}/data/platform_table_extract/{args['table_name']}/{date.strftime('year=%Y/month=%m/day=%d/hour=%H/minute=%M')}",
        "partitionKeys": [],
    },
    transformation_ctx="S3bucket_node2",
)

job.commit()
asked 3 years ago1.1K views
2 Answers
1

To be specific, the issue is not when writing to s3, it doesn't matter where you write, it's reading from the MySQL JDBC driver that is failing.
This is not a bug, it's done intentionally by MySQL.
I believe you can relax the data validation specifying the sql_mode on the JDBC url, by default I think it uses this: https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_strict_trans_tables (which includes the NO_ZERO_DATES option)

AWS
EXPERT
answered 3 years ago
0

I'm getting a similar error, except instead of YEAR is oxyz.pyWriteDynamicFrame.DAY_OF_MONTH. Funny things is, I've eliminated all date columns from the source, and yet I can't get rid of it. Any suggestions?

answered 2 years ago
  • An actual Date wouldn't trigger that error (because it cannot have an invalid date), that likely means some column in the database is a date but not in your data

  • Hi @Gonzalo Herreros, thank you for all the support you bring to the community, especially us newbies. I don't understand what you mean. To clarify, I can add that I've verified that there are 0000-00-00 values in date columns in the db (MySQL 5.7). I tried converting to text, dropping the columns, but if it is like you suggested, that is's reading from the MySQL JDBC driver that is failing, what can I do?

    I'm just trying to copy source data to a staging area in S3 in raw format. Then I plan to clean and transform.

  • I remember vaguely there is an option on the MySQL JDBC driver to tell it how to handle such invalid dates

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.