crawler never load data

0

I used glue crawler to load s3 data into glue catalog database, but only the tables were created - the tables were empty

javen
asked 3 months ago238 views
3 Answers
1

Hi,

Glue crawler is used to populate or extract metadata about the data stored in Amazon S3 and writes the metadata to the Data Catalog. However, the crawler itself does not move or copy the data from S3 into the Glue Data Catalog databases. Instead, it scans the data in S3, infers schema information, and creates metadata entries in the Glue Data Catalog. Once the metadata is created, you can use other AWS services such as Amazon Athena to access and query the data stored in S3 using the metadata stored within the Glue Data Catalog.

If your question is, you are not seeing the inferred schema, it could be due to several reasons. Double-check the configuration of your Glue Crawler and ensure that it is pointed to the correct S3 path. Additionally, verify the complexity of the file formats, especially in case of highly nested structures. Check also the presence of data within the S3 bucket you attached for Glue Crawler. It is also important to make sure that the IAM role you attached has permission to access the S3 bucket.
I hope it helps and you can refer Data Catalog and crawlers in AWS Glue for additional information.

profile pictureAWS
BezuW
answered 3 months ago
0

Using AWS Glue crawler to populate your Glue Data Catalog with metadata about the data stored in Amazon S3, but the tables are empty, there could be several reasons for this issue:- referring to AWS documentation :- https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html Make sure the data location specified in the S3 crawler configuration matches the actual location of your data in Amazon S3. Ensure that the path provided in the crawler settings is correct and accessible by the AWS Glue service.

Ensure that the data stored in Amazon S3 is in a format supported by AWS Glue. Glue supports various formats such as Parquet, ORC, JSON, CSV, Avro, and more. If your data is in a different format, you may need to convert it to a supported format before crawling.

Permissions: Check the permissions of the IAM role assigned to the AWS Glue crawler.Sometimes, running the Glue crawler multiple times may be necessary to fully populate the Glue Data Catalog. After making any necessary adjustments to the crawler configuration, try running the crawler again to see if it successfully populates the tables with data. Review the logs generated by the Glue crawler for any error messages or warnings. Error logs can provide valuable insights into any issues encountered during the crawling process and help diagnose the root cause of the problem.

By following these troubleshooting steps and addressing any issues identified, you should be able to diagnose why your Glue crawler is not loading data into the Glue Data Catalog and take appropriate corrective actions.

profile picture
EXPERT
answered 3 months ago
0

Crawler only reads the location to capture table metadata in the data catalogue. You subsequently need to use compute like Athena to query the table and view the data.

profile pictureAWS
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions