- Newest
- Most votes
- Most comments
There is no specific limit to the number of tables that can be scanned by a Glue Crawler. However, there are a few things that could be causing your Crawler to not update the data for November and December 2022:
The Crawler's schedule: The Crawler might not be running as frequently as you expect. Verify that the Crawler schedule is set correctly.
S3 bucket permissions: The Crawler needs permissions to access the S3 bucket where the data is stored. Verify that the IAM role associated with the Crawler has the necessary permissions to access the S3 bucket.
Partitioning: Verify that the partitioning scheme you have set up is correct, and that the Crawler is looking for the partitions in the correct location.
Data format: Make sure that the data in the S3 bucket is in a format that the Crawler can understand.
Data size: The Crawler has a maximum amount of data it can crawl. If the data size is too large, the Crawler might not be able to process it all.
Glue Crawler's configuration: You can check the Crawler's properties and see if there are any configurations that need to be changed.
Athena Partitions: Verify that the partitions are visible on Athena and that the data is visible on Athena.
You can check the Glue Crawler's logs and CloudWatch logs to get more information about the error, If the problem persists you might want to try creating a new Crawler, or refer to the Glue Crawler documentation or AWS Support for further assistance.
Can you please check when did the crawler run last ? My guess is , it last ran in October.
Do run the crawler once more and check . The latest months partitions should get created if the previous ones worked fine. If you have the need for running it every month , schedule accordingly.
I observed the schema was different starting Nov'22 i.e a new partition group was added inside month.Do you know if there is a way I can detect the tables under a new partition using the same crawler? If yes, what should be the configuration? For eg. before I had a partition for Year and Month until October 2022 (structure was /Year/Month/.csv ) and now we have an added partition (current structure is /Year/Month/group/.csv) How can I accommodate this change?
"TableLevelConfiguration" should do the trick. Set it to 3 for the crawler. Check if that works . My hunch is the crawler will expect all the data to be at the level 3. If that is the problem , Move the existing data ( till Oct ) to a default Group so that the crawler finds all the data at the same level . This can be done fro the console or through a simple CLI call.
Refer : https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html
Alternatively :
Add a second crawler with level 3 , and get it cataloged to a new table You can then create a view in Athena of the 2 tables ( Table -1 : Data till Oct, Table -2: Data from Nov ) and use it in Quicksight
Relevant content
- asked a year ago
- asked 8 months ago
- asked a month ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated 4 years ago
I just noticed. My team added a new partition starting month of November 2022. Do you know if there is a way I can detect the tables under a new partition using the same crawler? If yes, what should be the configuration? For eg. before I had a partition for Year and Month until October 2022 (structure was /Year/Month/.csv ) and now we have an added partition (current structure is /Year/Month/group/.csv) How can I accommodate this change?