跳至內容

An error occurred while calling o103.pyWriteDynamicFrame. YEAR

0

I have created a Glue job where I copy a table from a RDS database (MySQL) into S3. I have crawled the RDS database and I reference the table defined in the Glue Data Catalog in the job.

When writing to s3 by using glueContext.write_dynamic_frame.from_options I receive above error. When looking into the error logs I see that I got the following exceptions: java.sql.SQLException, com.mysql.cj.exceptions.WrongArgumentException and java.lang.IllegalArgumentException.

When searching the web I found that the issue is that a 0-year date exists in one of the entries in the database (see: https://bugs.mysql.com/bug.php?id=94872).

When removing these records the error was not raised and the job completed successfully. Since the MySQL database allows these records I did not expect this issue when writing to S3. Can this be considered as a bug or should we update the database records? What values in the database are allowed?

See below the Python script:

import datetime
import sys

from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext

args = getResolvedOptions(sys.argv, ["JOB_NAME", "table_name", "s3_bucket"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

date = datetime.datetime.now()

# Script generated for node JDBC Connection
JDBCConnection_node1 = glueContext.create_dynamic_frame.from_catalog(
    database="glue_rds",
    table_name=args["table_name"],
    transformation_ctx="JDBCConnection_node1",
)

# Script generated for node S3 bucket
S3bucket_node2 = glueContext.write_dynamic_frame.from_options(
    frame=JDBCConnection_node1,
    connection_type="s3",
    format="parquet",
    connection_options={
        "path": f"s3://{args['s3_bucket']}/data/platform_table_extract/{args['table_name']}/{date.strftime('year=%Y/month=%m/day=%d/hour=%H/minute=%M')}",
        "partitionKeys": [],
    },
    transformation_ctx="S3bucket_node2",
)

job.commit()
已提問 3 年前檢視次數 1107 次
2 個答案
1

To be specific, the issue is not when writing to s3, it doesn't matter where you write, it's reading from the MySQL JDBC driver that is failing.
This is not a bug, it's done intentionally by MySQL.
I believe you can relax the data validation specifying the sql_mode on the JDBC url, by default I think it uses this: https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_strict_trans_tables (which includes the NO_ZERO_DATES option)

AWS
專家
已回答 3 年前
0

I'm getting a similar error, except instead of YEAR is oxyz.pyWriteDynamicFrame.DAY_OF_MONTH. Funny things is, I've eliminated all date columns from the source, and yet I can't get rid of it. Any suggestions?

已回答 2 年前
  • An actual Date wouldn't trigger that error (because it cannot have an invalid date), that likely means some column in the database is a date but not in your data

  • Hi @Gonzalo Herreros, thank you for all the support you bring to the community, especially us newbies. I don't understand what you mean. To clarify, I can add that I've verified that there are 0000-00-00 values in date columns in the db (MySQL 5.7). I tried converting to text, dropping the columns, but if it is like you suggested, that is's reading from the MySQL JDBC driver that is failing, what can I do?

    I'm just trying to copy source data to a staging area in S3 in raw format. Then I plan to clean and transform.

  • I remember vaguely there is an option on the MySQL JDBC driver to tell it how to handle such invalid dates

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。