- Newest
- Most votes
- Most comments
If you'd like to create tables for each region you should structure your s3 bucket, folders and files in the following manner:
- s3-bucket-name
- north_america
- north_america_date.csv
- south_america
- south_america_date.csv
- region_3
- region_3_date.csv
- north_america
When you create a crawler, you can specify multiple sources to crawl. You will need to point your crawler to each top level folder. So one data source would be the "north-america" folder in the s3 bucket and you should point it to that folder, not the file. Ensure to have "crawl all sub-folders" checked. If you also want to partition by year, month, date, then also include those as folders in s3 and then put the file after.
An example s3 path would look like this:
point crawler to: s3://bucket-name/north_america
The crawler will create a table named "north_america" If you added year, month, day folders, the crawler will pick those up and add as partitions as well.
Repeat these steps for each of the 5 regions you want using a single crawler.
Also, make sure Create a single schema for each S3 path is not checked, otherwise it may try to create one table.
When you run the crawler after defining the 5 data sources for the regions, it should create the 5 tables for you, while you only made one crawler.
single schema information - https://docs.aws.amazon.com/glue/latest/dg/crawler-grouping-policy.html
defining crawler - https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html
Relevant content
- asked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 25 days ago
