- Le plus récent
- Le plus de votes
- La plupart des commentaires
Please be advised that as per Job Bookmark documentation [1] this feature only supports S3 and JDBC sources Also from the way you have tried to implement Job bookmarks is specific for JDBC data stores. This does not support non-relational DB as the source & it is due to this you see your data is being reprocessed.
Workaround: I would also like you to consider overwriting the S3 location so you won't get the replicated data in output file everytime you run a job. You can do that with below commands:
df=dynamic_frame2.toDF()
df.write.mode('overwrite').parquet("s3://<bucket>/<folder>")
Experiment: You can try to just enable bookmark for your job and use this code to create you dynamicFrame and then can run the job.
datasource0 = glueContext.create_dynamic_frame_from_catalog(database = catalogDB,
table_name=catalogTable,connection_type="mongodb",connection_options=read_mongo_options,transformation_ctx = "datasource0").
References: [1] refer to the second code example : https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html#monitor-continuations-script
Contenus pertinents
- demandé il y a un an
- demandé il y a un an
- demandé il y a 7 mois
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 3 ans
Any news for Mongo DB about Job bookmark ?