Update/Create Governed Table from Glue job

0

Currently It's possible to create or update catalog tables from Glue job. https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html

Two questions :

  1. Is it possible to create or update Governed table from Glue job ?
  2. How crawler can create Governed table ?

Thanks!

質問済み 2年前637ビュー
2回答
0

Hello,

Answering to your queries:

  1. Is it possible to create or update Governed table from Glue job ?

Yes, it is possible for your to create and update Governed table from Glue job using the CreateTable and UpdateTable API calls. In addition, there are restrictions on the UpdateTable API operation. You can't update the table type, change partition keys, or change the table location.

  1. How crawler can create Governed table ?

No, crawler does not support Governed tables.

Please refer to https://docs.aws.amazon.com/lake-formation/latest/dg/governed-table-restrictions.html for complete details on the restrictions. You can view the answers for your above queries in this link as well. It also provides the details on what APIs are not allowed on the governed tables. I hope this information helps and answers your queries.

Thank you!

AWS
サポートエンジニア
回答済み 2年前
  • Hello Manish, Thanks for your answers.

    I can create Non-Governed table as below

    sink = glueContext.getSink(
        connection_type="s3", 
        path="<S3_output_path>",
        enableUpdateCatalog=True,
        partitionKeys=["region", "year", "month", "day"])
    sink.setFormat("json")
    sink.setCatalogInfo(catalogDatabase=<target_db_name>, catalogTableName=<target_table_name>)
    sink.writeFrame(last_transform)
    

    Is there any way to add TableType option in this so It can create governed table automatically? Because I think in CreateTable we need to send schema as well which I don't know how to grab it from dynamicframe in Glue job.

0

You cannot add the TableType option in the setCatalogInfo() method. If you know the column names at this stage, you could use createTable or updateTable methods suggested by the other answerer.

As you have mentioned that you would not know the column names, my suggestion is to use the AWS Wrangler library.

awswrangler.s3.to_json

wr.s3.to_json(
    df=df,
    path=f"s3://{bucket}/{database}/{table}/",
    dataset=True,
    database=database,
    table=table,
    table_type="GOVERNED",
    transaction_id=transaction_id)
profile pictureAWS
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ