Create mulitple glue crawlers for specific file names in same folder pattern

0

Hi,

I have the following:

  • s3 bucket with a folder called parse_output
  • in the parse_output are dated folders (2023-11-06, 2023-11-04 etc)
  • In each folder are the same set of files example companiese.csv.gz / offiers.csv.gz etc)

Im trying to create a crawler for each of the specific file types. I had originally tried to do a crawler with a path like: s3://bucket_name/parse_output/**/companies.csv.gz but this always returned 0 results.

reading up, it seems that the data source path is fixed to a bucket or sub folder. and that you then use exclusions, is that correct? in which case, how do you filter out all file names except the one you want to keep in the exclusion?

feita há 6 meses272 visualizações
1 Resposta
0

You can use negated exclusions patterns using [! ] but what you are doing will only allow single file tables (since the tables cannot implement the exclusions).
https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

profile pictureAWS
ESPECIALISTA
respondido há 6 meses
  • is there any solution to get all the csvs with matching names into 1 table each (i believe from there you can filter by file in athena)

  • My advice is to not fight how tables work, organize the file files better even if it means having an extra copy

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas