aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key

0

Hi AWS expert, I have a code read data from AWS aurora PostgreSQL, I want to bookmark the table with custom column named 'ceres_mono_index'. But it seems like the bookmark is still uses the primary key as the bookmark key instead of column 'ceres_mono_index'. Here is the code


cb_ceres = glueContext.create_dynamic_frame.from_options(
    connection_type="postgresql",
    connection_options={
        "url": f"jdbc:postgresql://{ENDPOINT}:5432/{DBNAME}",
        "dbtable": "xxxxx_raw_ceres",
        "user": username,
        "password": password,
    },
    additional_options={"jobBookmarkKeys": ["ceres_mono_index"], "jobBookmarkKeysSortOrder": "asc"},
    transformation_ctx="cb_ceres_bookmark",
)

How could I fix the issue? Thank you

已提問 1 年前檢視次數 1516 次
1 個回答
0

I believe PostgreSQL column names are case-sensitive(when double quoted as given in the code). AWS Glue does not support using case-sensitive columns as job bookmark keys. I think that is why glue is still using primary key as the job bookmark key. Check out the documentation for the same when using with JDBC sources https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

已回答 1 年前
  • Thank you so much for your reply. I am confuse whether it is possible to change it to non-case-sensitive (with single quoted? ), and make it work for PostgreSQL or it is impossible or it is not possible to work with PostgreSQL?

  • yes of course it is possible to work with PostgreSQL. As per your code you are using 'additional_options' param in 'create_dynamic_frame.from_options' function which is not correct way. If you look into the documentation link, you would notice this - You can specify jobBookmarkKeys and jobBookmarkKeysSortOrder in the following ways: create_dynamic_frame.from_catalog — Use additional_options. create_dynamic_frame.from_options — Use connection_options. you can try giving the same in "connection_options" as mentioned and check if it works for you.

    Alternatively, you can also try another way - create a glue crawler to crawl your PostgreSQL datastore and run it to create glue catalog metadata table. you can pass in classifiers in crawler config or even map columns to different name/type using applyMapping feature. Then create the dynamic frame using 'glueContext.create_dynamic_frame.from_catalog' function and pass in bookmark keys in 'additional_options' param.

  • Thank you so much for the help. Move the arguments to 'connection_options' works

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南