1 Respuesta
- Más nuevo
- Más votos
- Más comentarios
2
You can configure an AWS Glue Crawler to selectively crawl specific files from your S3 bucket using include patterns. By specifying the pattern regionA.csv, for example, you instruct the crawler to only consider files containing regionA in their names. This approach allows you to focus the crawling process on the desired data subset, improving efficiency and reducing processing time. Alternatively, you can create a table in the AWS Glue Data Catalog for the specific files you're interested in and configure the crawler to update that table. Additionally, you have the flexibility to automate this process using the AWS CLI or Boto3, providing you with greater control and customization options.
Contenido relevante
- preguntada hace 15 días
- preguntada hace 15 días
- preguntada hace 4 meses
- OFICIAL DE AWSActualizada hace 6 meses

Is there a way for the crawler to generate multiple metadata? For example, is there a way a crawler can generate separate metatables for regionA, regionB, regionC, etc? Or can it only be done through assigning each crawler for each region?
In AWS Glue, a single crawler can generate metadata for multiple regions by using a combination of custom classifiers, filters, and partitioning strategies. if it is not too urgent i can come up with something before tomorrow
That would be awesome! Also, where can I use the 'patterns' so that I can specify the name of the files to crawl from?
Kindly check this . You will find the steps on how to do it there :- https://medium.com/@debolek4dem/how-to-create-aws-glue-crawler-for-specific-files-based-on-a-pattern-2926c5df65c6