Create mulitple glue crawlers for specific file names in same folder pattern

0

Hi,

I have the following:

  • s3 bucket with a folder called parse_output
  • in the parse_output are dated folders (2023-11-06, 2023-11-04 etc)
  • In each folder are the same set of files example companiese.csv.gz / offiers.csv.gz etc)

Im trying to create a crawler for each of the specific file types. I had originally tried to do a crawler with a path like: s3://bucket_name/parse_output/**/companies.csv.gz but this always returned 0 results.

reading up, it seems that the data source path is fixed to a bucket or sub folder. and that you then use exclusions, is that correct? in which case, how do you filter out all file names except the one you want to keep in the exclusion?

질문됨 6달 전272회 조회
1개 답변
0

You can use negated exclusions patterns using [! ] but what you are doing will only allow single file tables (since the tables cannot implement the exclusions).
https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

profile pictureAWS
전문가
답변함 6달 전
  • is there any solution to get all the csvs with matching names into 1 table each (i believe from there you can filter by file in athena)

  • My advice is to not fight how tables work, organize the file files better even if it means having an extra copy

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠