Create mulitple glue crawlers for specific file names in same folder pattern

0

Hi,

I have the following:

  • s3 bucket with a folder called parse_output
  • in the parse_output are dated folders (2023-11-06, 2023-11-04 etc)
  • In each folder are the same set of files example companiese.csv.gz / offiers.csv.gz etc)

Im trying to create a crawler for each of the specific file types. I had originally tried to do a crawler with a path like: s3://bucket_name/parse_output/**/companies.csv.gz but this always returned 0 results.

reading up, it seems that the data source path is fixed to a bucket or sub folder. and that you then use exclusions, is that correct? in which case, how do you filter out all file names except the one you want to keep in the exclusion?

asked 6 months ago252 views
1 Answer
0

You can use negated exclusions patterns using [! ] but what you are doing will only allow single file tables (since the tables cannot implement the exclusions).
https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

profile pictureAWS
EXPERT
answered 6 months ago
  • is there any solution to get all the csvs with matching names into 1 table each (i believe from there you can filter by file in athena)

  • My advice is to not fight how tables work, organize the file files better even if it means having an extra copy

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions