My AWS Glue jobs and their bookmarks aren't successfully running or processing the required data.
Resolution
Correctly configure your bookmark
When you configure your bookmark, take the following actions:
- Turn on the Enable Bookmark option for the job.
- Set the maximum number of concurrent runs for the job to 1.
Correctly implement your bookmark
In your extract, transform, and load (ETL) job, use the AWS Glue DynamicFrame API to read data from the data source.
Note: Don't use the DataFrame API or Apache Spark SQL to read data from the data source. These methods don't support the AWS Glue job bookmark feature.
Include the following in your script:
`job.init(args['JOB_NAME'], args) `
`datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "db_name",table_name = "table_name", transformation_ctx = "datasource0")`
`job.commit()`
When you create the DynamicFrame, you must add the transformation_ctx parameter as a unique identifier for the ETL operator instance.
Note: Don't change the transformation_ctx parameter when you update or modify the script.
Troubleshoot issues with bookmarks for JDBC sources
If you experience issues with a bookmark to a Java Database Connectivity (JDBC) source, then take the following actions:
- If your AWS Glue script doesn't specify columns to use as bookmark keys, then sort the table's primary key in increasing or decreasing order without gaps.
- If the script uses user-defined bookmarks as keys, then sort the keys in increasing or decreasing order. You can include gaps.
- Don't use columns with case-sensitive names as bookmark keys.
Related information
Tracking processed data using job bookmarks