Glue Jobs have no access to current schemas (Glue Catalog)

0

Hi,

Context : I would like to extract a table from an oracle database and write the data in a parquet format on S3. I use " glue connection", "glue database" and "glue crawler". All works fine !

Issue: Glue set decimal(38,0) as column type in the data catalog rather than string. I updated the data Catalog with the new column type. Nevertheless in my Glue ETL job the column type is stil "decimal" ( I can see it because I print the schema). I extract the data using the create_dynamic_frame.from_catalog( database="", table_name="", transformation_ctx=""). I set the role with fullAcess to Glue/S3/Cloudwatch/EC2. When I print the schema, there is no difference after changing the column type in the data catalog.

Could you help me ?

MehdiE
asked a year ago334 views
1 Answer
0

Glue DynamicFrames are self-describing and no schema is required on creation. Instead, the schema is computed on-the-fly with inconsistent types encoded as choice types.

In your case, you can use the ApplyMapping class to specify the column type after reading the catalog table.

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions