Is the data already present in the MySQL table?
AWS Glue is based on Spark, as you know, 1 node is the Spark driver and the other nodes are hosting the executors (doing the real works), in you case with just 2 workers you have only 1 executor in your cluster.
Even with multiple executor, spark partitions the data without replicating it this means that even parallel writes will never try to enter the same data twice, unless as already mentioned your code has introduced the duplicates.
Glue works in Append mode only. no updates, have you checked if that key is, by any chance, already present in the target database?
hope this helps,
Are you doing some transformation, join, which can produce duplicate data? Is it happening to all the keys or single key only. I would suggest to remove PK or unique constraint and let the job complete once. Then you can evaluate in RDS is it true duplicate or some transformation is generating duplicate
Glue job error : run ID: jr_f96799827354866ac2e798fb8b40d5781284e5ed5b3a4ffasked a year ago
Glue job fail many workers
Glue job s3 file not found exceptionasked 5 years ago
Glue Job Permission (AccessDeniedException) Errorasked 12 days ago
AWS Glue Studio Job Failed An error occurred while calling o106.pyWriteDynamicFrame. Data truncation: Data too long for column [insertcolumnname] at row 1asked 24 days ago
AWS Glue retry a job after an execution errorAccepted Answerasked 5 months ago
Glue Job Fails | s3 to db/mysql | An error occurred while calling o87.getCatalogSink. None.getasked 6 days ago
glue job fail for file encoding
fail a glue job if the called stored procedure failsasked 8 months ago
How do I get the output of an AWS Glue DataBrew job to be a single CSV file?Accepted Answerasked 2 years ago