- Newest
- Most votes
- Most comments
The issue you're experiencing with your AWS Glue job and job bookmarks is likely related to how the job is interacting with the new data in your S3 bucket. Here are a few potential reasons and solutions to consider:
-
Job Bookmark Implementation: Ensure that you're using the AWS Glue DynamicFrame API to read data from your source, not the DataFrame API or Spark SQL. The DynamicFrame API is required for job bookmarks to function correctly.
-
Transformation Context: While you mentioned having a transformation context in place, double-check that you're using a unique identifier for each ETL operator instance. This is crucial for job bookmarks to track processed data correctly.
-
S3 Input Source: For S3 input sources, job bookmarks use the last modified time of objects to determine which ones need processing. If your new data is being added to existing files rather than creating new files, the job bookmark might not detect the changes.
-
Data Source Property: Ensure you haven't changed the data source property (like the S3 input path) while the bookmark is enabled. Changing the input path without changing the transformation context can lead to missing or skipping files.
-
Bookmark Reset: If you've recently deleted and recreated the table using a crawler, this could have reset the job bookmark, causing it to process all data again.
-
Concurrent Runs: Make sure you've set the maximum number of concurrent runs for the job to 1. This helps prevent issues with job bookmarks.
-
JDBC Source Considerations: If you're using a JDBC source, ensure that the columns used as bookmark keys are properly sorted and don't have gaps.
To troubleshoot further, you could try the following:
-
Review your job logs to see if there are any error messages or indications of why no data is being processed.
-
Temporarily enable more verbose logging to get more insight into the job's behavior.
-
Verify that the new data in your S3 bucket is indeed in a location and format that your Glue job expects.
-
Consider implementing a custom bookmark solution if the built-in job bookmark feature doesn't suit your specific use case.
By carefully reviewing these aspects of your Glue job configuration and data pipeline, you should be able to identify and resolve the issue causing your job to process no data while bookmarks are enabled.
Sources
AWS Job Bookmark not working and not creating temp files | AWS re:Post
Troubleshoot AWS Glue job bookmarks | AWS re:Post
glue job bookmark | AWS re:Post
Tracking processed data using job bookmarks - AWS Glue
Relevant content
- asked 3 years ago

I checked all the things and it looks fine..so I am not sure where is this error coming from.