- Newest
- Most votes
- Most comments
Hello,
When job bookmarks is enabled the job keeps track of which rows have been processed using a column (or columns) specified as the job bookmark key. If no bookmark key is specified, Glue uses the primary key as the bookmark key by default[1].
If no bookmark key is specified, the primary key must be sequentially increasing or decreasing with no gaps. If the bookmarks keys are user-defined, they must be strictly monotonically increasing or decreasing, with gaps permitted[1].
Please verify the bookmark key column in the source table meets this criteria. If the column in a table designated as the bookmark key does not meet this criteria, it can result in not all of the data being read, particularly in subsequent runs.
For JDBC sources, the following rules apply:
1.For each table, AWS Glue uses one or more columns as bookmark keys to determine new and processed data. The bookmark keys combine to form a single compound key. 2. AWS Glue by default uses the primary key as the bookmark key, provided that it is sequentially increasing or decreasing (with no gaps). 3. You can specify the columns to use as bookmark keys in your AWS Glue script. For more information about using Job bookmarks in AWS Glue scripts, see Using job bookmarks. 4. AWS Glue doesn't support using columns with case-sensitive names as job bookmark keys.
You can find an example in the below documentation for JDBC SOURCE[2]
[1]https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
[2]https://docs.aws.amazon.com/glue/latest/dg/programming-etl-connect-bookmarks.html
Thank you !
Hi,
For me the whole bookmarking didn't work and to get an Upsert working for S3 I ended up building a script using the provided delta lake package from this location : https://mvnrepository.com/artifact/io.delta/delta-storage
More info on that: https://dev.to/awscommunity-asean/making-your-data-lake-acid-compliant-using-aws-glue-and-delta-lake-gk9 and here: https://dev.to/awscommunity-asean/sql-based-inserts-deletes-and-upserts-in-s3-using-aws-glue-3-0-and-delta-lake-42f0
But, now we are moving away from it because we are going to store our data no longer in S3 but in Redshift, and I found out (yesterday) that it's support Upsert functionality out of the box.
René
Relevant content
- asked 2 years ago
- Accepted Answerasked 5 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- How can I use a Lambda function to automatically start an AWS Glue job when a crawler run completes?AWS OFFICIALUpdated 2 years ago
thanks for the answer. I'm using a ID column as bookmark, but when bookmark is enable, job doesn't take any new data or modified data. If i change to Disable, then the job take new data, but every time i start the job he process every data, and i'm with all data multiple times. How can i deal with this?