- Newest
- Most votes
- Most comments
Hello Gurpreet,
To create separate Glue Catalog tables for different types of files within a single S3 bucket, you can follow these steps:
-
Create Separate Folders for Each Data Type:
Organize your files within the S3 bucket into separate folders based on their data types. In your case, you already have files named with prefixes that indicate their type ("PowerBiAgentReport," "PowerBiQueueReport," and "Occupancy"). You can move these files into separate folders like this:
s3://gurpreet-source-bucket/connect/reports/PowerBiAgentReport/
s3://gurpreet-source-bucket/connect/reports/PowerBiQueueReport/
s3://gurpreet-source-bucket/connect/reports/Occupancy/
-
Create a Glue Crawler for Each Folder:
Create a separate Glue Crawler for each of the folders you've created. Each crawler should be configured to point to one of the folders and generate a separate table in the Glue Data Catalog.
-
Configure the Crawlers:
When configuring the crawlers, you should specify the appropriate database, prefix, and options to ensure that each crawler creates a separate table with the desired schema. For example:
-
Crawler 1 (PowerBiAgentReport):
- Data store:
s3://gurpreet-source-bucket/connect/reports/PowerBiAgentReport/
- Database: Glue database where you want to create the table (e.g., "MyDatabase")
- Prefix: (empty or as needed)
- Schema and table name: Choose options that match your files.
- Data store:
-
Crawler 2 (PowerBiQueueReport):
- Data store:
s3://gurpreet-source-bucket/connect/reports/PowerBiQueueReport/
- Database: Same Glue database as above ("MyDatabase")
- Prefix: (empty or as needed)
- Schema and table name: Choose options that match your files.
- Data store:
-
Crawler 3 (Occupancy):
- Data store:
s3://gurpreet-source-bucket/connect/reports/Occupancy/
- Database: Same Glue database as above ("MyDatabase")
- Prefix: (empty or as needed)
- Schema and table name: Choose options that match your files.
- Data store:
-
-
Run the Crawlers:
Execute each crawler to scan the respective folders and create separate Glue Catalog tables for each data type. You can do this from the AWS Glue Console or by using the AWS CLI or SDK.
-
Verify the Tables:
Once the crawlers have completed their runs, go to the Glue Data Catalog to verify that you have separate tables for each data type.
By following these steps, you'll have separate Glue Catalog tables for the different types of files in your S3 bucket, and you can query them individually in Athena or other AWS services as needed.
Please give a thumbs up if my suggestion helps
Hi Gabriel,
Thanks for your reply on my question.
Actually, in my case, source bucket is at another AWS account's control and they won't be able to put the files into separate folders for each different types of files.
Is there any way to crawel these files(of different types) from a single folder in another account and create different Glue Catalog tables in my account based on the each file type ?
Thanks, Gurpreet
Hi Gabriel,
Thanks for your reply on my question.
Actually, in my case, source bucket is at another AWS account's control and they won't be able to put the files into separate folders for each different types of files.
Is there any way to crawel these files(of different types) from a single folder in another account and create different Glue Catalog tables in my account based on the each file type ?
Thanks, Gurpreet
Relevant content
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 6 months ago
Did you find an answer to this?