Create mulitple glue crawlers for specific file names in same folder pattern

0

Hi,

I have the following:

  • s3 bucket with a folder called parse_output
  • in the parse_output are dated folders (2023-11-06, 2023-11-04 etc)
  • In each folder are the same set of files example companiese.csv.gz / offiers.csv.gz etc)

Im trying to create a crawler for each of the specific file types. I had originally tried to do a crawler with a path like: s3://bucket_name/parse_output/**/companies.csv.gz but this always returned 0 results.

reading up, it seems that the data source path is fixed to a bucket or sub folder. and that you then use exclusions, is that correct? in which case, how do you filter out all file names except the one you want to keep in the exclusion?

質問済み 6ヶ月前272ビュー
1回答
0

You can use negated exclusions patterns using [! ] but what you are doing will only allow single file tables (since the tables cannot implement the exclusions).
https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

profile pictureAWS
エキスパート
回答済み 6ヶ月前
  • is there any solution to get all the csvs with matching names into 1 table each (i believe from there you can filter by file in athena)

  • My advice is to not fight how tables work, organize the file files better even if it means having an extra copy

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ