- Newest
- Most votes
- Most comments
Please be advised that as per Job Bookmark documentation [1] this feature only supports S3 and JDBC sources Also from the way you have tried to implement Job bookmarks is specific for JDBC data stores. This does not support non-relational DB as the source & it is due to this you see your data is being reprocessed.
Workaround: I would also like you to consider overwriting the S3 location so you won't get the replicated data in output file everytime you run a job. You can do that with below commands:
df=dynamic_frame2.toDF()
df.write.mode('overwrite').parquet("s3://<bucket>/<folder>")
Experiment: You can try to just enable bookmark for your job and use this code to create you dynamicFrame and then can run the job.
datasource0 = glueContext.create_dynamic_frame_from_catalog(database = catalogDB,
table_name=catalogTable,connection_type="mongodb",connection_options=read_mongo_options,transformation_ctx = "datasource0").
References: [1] refer to the second code example : https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html#monitor-continuations-script
Relevant content
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
Any news for Mongo DB about Job bookmark ?