How do I troubleshoot and resolve issues with AWS Glue job bookmarks?

2 minute read
0

My AWS Glue jobs and their bookmarks aren't successfully running or processing the required data.

Resolution

Correctly configure your bookmark

When you configure your bookmark, take the following actions:

  • Turn on the Enable Bookmark option for the job.
  • Set the maximum number of concurrent runs for the job to 1.

Correctly implement your bookmark

In your extract, transform, and load (ETL) job, use the AWS Glue DynamicFrame API to read data from the data source.

Note: Don't use the DataFrame API or Apache Spark SQL to read data from the data source. These methods don't support the AWS Glue job bookmark feature.

Include the following in your script:

`job.init(args['JOB_NAME'], args) `  
`datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "db_name",table_name = "table_name", transformation_ctx = "datasource0")`  
`job.commit()`

When you create the DynamicFrame, you must add the transformation_ctx parameter as a unique identifier for the ETL operator instance.

Note: Don't change the transformation_ctx parameter when you update or modify the script.

Troubleshoot issues with bookmarks for JDBC sources

If you experience issues with a bookmark to a Java Database Connectivity (JDBC) source, then take the following actions:

  • If your AWS Glue script doesn't specify columns to use as bookmark keys, then sort the table's primary key in increasing or decreasing order without gaps.
  • If the script uses user-defined bookmarks as keys, then sort the keys in increasing or decreasing order. You can include gaps.
  • Don't use columns with case-sensitive names as bookmark keys.

Related information

Tracking processed data using job bookmarks

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago