- 新しい順
- 投票が多い順
- コメントが多い順
Glue Crawler has an option to use for schema evolution scenario. To be able to add new columns whenever new columns are added into the files, you need to follow the following steps:
-
Make sure you choose "Crawl all folders"
-
Choose Add new columns or Update Table definition in Catalog under advanced options in "Set output and scheduling"
In case your column names are updated correctly, you may also need to create a classifier and link that classifier with this crawler. Here are the options you need to choose for the classifier:
Thanks for bring this scenario up. Is it possible for you to try to do perform few tests using parquet files and see if that works for your use case?
Is this a change I can make in the crawler?
Thanks for follow up question. You can use glue or your own transform from csv to parquet. For testing may be try https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html and see how it works for your case.
関連するコンテンツ
- 質問済み 9ヶ月前
- AWS公式更新しました 3年前
- AWS公式更新しました 1年前
- AWS公式更新しました 1年前
- AWS公式更新しました 2年前
I've tried with these exact settings, deleting the table and letting the crawler recreate it, but the columns when queried in Athena still have the wrong data in them.