Create mulitple glue crawlers for specific file names in same folder pattern

Question

Hi,

I have the following:

- s3 bucket with a folder called parse_output
- in the parse_output are dated folders (2023-11-06, 2023-11-04 etc) 
- In each folder are the same set of files example companiese.csv.gz / offiers.csv.gz etc)

Im trying to create a crawler for each of the specific file types. I had originally tried to do a crawler with a path like: `s3://bucket_name/parse_output/**/companies.csv.gz` but this always returned 0 results.

reading up, it seems that the data source path is fixed to a bucket or sub folder. and that you then use exclusions, is that correct? in which case, how do you filter out all file names except the one you want to keep in the exclusion?

Answer

You can use negated exclusions patterns using [! ] but what you are doing will only allow single file tables (since the tables cannot implement the exclusions).   
https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

Create mulitple glue crawlers for specific file names in same folder pattern

Contenuto pertinente