By using AWS re:Post, you agree to the Terms of Use

Partition schema mismatch in Glue Table

0

Hi Team,

We have a dataset coming in from the source team which keeps on changing on a daily basis for ex: one day the dataset would have 100 columns, another day it would have 92 columns and the dataset is in csv format(tab separated). We have created a glue crawler to crawl all the files at once, the problem is we are able to load the dataset in glue Schema/table, but data within the table is misaligned meaning records are not aliigned with the column names as it should.

While creating the glue crawler, I have passed below configuration:

In Grouping behavior for S3 data (optional): I have check marked **Create a single schema for each S3 path ** In Configuration options (optional) During the crawler run, all schema changes are logged.

When the crawler detects schema changes in the data store, how should AWS Glue handle table updates in the data catalog? -> Add new columns only. -> Update all new and existing partitions with metadata from the table.

How should AWS Glue handle deleted objects in the data store? -> Mark the table as deprecated in the data catalog.

Could you please help me out how to fix this misalignment problem. Thank you in advance.

Regards, Apurva

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions