Create mulitple glue crawlers for specific file names in same folder pattern

0

Hi,

I have the following:

  • s3 bucket with a folder called parse_output
  • in the parse_output are dated folders (2023-11-06, 2023-11-04 etc)
  • In each folder are the same set of files example companiese.csv.gz / offiers.csv.gz etc)

Im trying to create a crawler for each of the specific file types. I had originally tried to do a crawler with a path like: s3://bucket_name/parse_output/**/companies.csv.gz but this always returned 0 results.

reading up, it seems that the data source path is fixed to a bucket or sub folder. and that you then use exclusions, is that correct? in which case, how do you filter out all file names except the one you want to keep in the exclusion?

demandé il y a 6 mois272 vues
1 réponse
0

You can use negated exclusions patterns using [! ] but what you are doing will only allow single file tables (since the tables cannot implement the exclusions).
https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

profile pictureAWS
EXPERT
répondu il y a 6 mois
  • is there any solution to get all the csvs with matching names into 1 table each (i believe from there you can filter by file in athena)

  • My advice is to not fight how tables work, organize the file files better even if it means having an extra copy

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions