Glue Jobs have no access to current schemas (Glue Catalog)

0

Hi,

Context : I would like to extract a table from an oracle database and write the data in a parquet format on S3. I use " glue connection", "glue database" and "glue crawler". All works fine !

Issue: Glue set decimal(38,0) as column type in the data catalog rather than string. I updated the data Catalog with the new column type. Nevertheless in my Glue ETL job the column type is stil "decimal" ( I can see it because I print the schema). I extract the data using the create_dynamic_frame.from_catalog( database="", table_name="", transformation_ctx=""). I set the role with fullAcess to Glue/S3/Cloudwatch/EC2. When I print the schema, there is no difference after changing the column type in the data catalog.

Could you help me ?

MehdiE
已提问 2 年前347 查看次数
1 回答
0

Glue DynamicFrames are self-describing and no schema is required on creation. Instead, the schema is computed on-the-fly with inconsistent types encoded as choice types.

In your case, you can use the ApplyMapping class to specify the column type after reading the catalog table.

profile pictureAWS
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则