1 Answer
- Newest
- Most votes
- Most comments
0
Hello,
Unfortunately, as of now, Glue crawler does not have such a feature to crawl only the most recent partition. All you can try is to specify an exclusion/inclusion pattern which are simple wild cards like * and not sophisticated enough to get something like current date.
However, you can try something like below
- Create a Glue table manually on your path like /year=2022/month=06/day=01
- Create a Glue crawler with the above table as source
- Run the crawler
- On the next day, when you have a new partition day=02, you can write a simple code like below which updates the path/location of the table and starts the crawler programmatically
import boto3
client = boto3.client('glue')
response = client.update_table(DatabaseName='db',TableInput={'Name':'tbl','StorageDescriptor':{'Location':'<S3_Bucket>/year=2022/month=06/day=02'}})
response1=client.start_crawler(Name='mycrawler')
Relevant content
- asked a year ago
- asked 3 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 4 years ago