Glue crawler parsing emr-serverless spark stderr logs in S3

0

Trying to figure out if it's possible to use AWS Glue crawler to parse the spark stderr logs that are dumped from emr-serverless. The logs are space delimited. I tried running a crawler against the following and it said it couldn't determine any columns. Can any crawler experts weigh in on whether this can be parsed? Can I write my own crawler for this type of data?

24/01/25 17:35:27 INFO SparkContext: Running Spark version 3.4.1-amzn-2 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO: ResourceUtils: No custom resources configured for spark driver. 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO SparkContext: Submitted application: WordCount 24/01/25 17:35:27 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 24/01/25 17:35:27 DEBUG InternalLoggerFactor: Using SLF4J as the default logging framework 24/01/25 17:35:27 DEBUG PlatformDependent0: Java version: 8

ebethj
asked 3 months ago153 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions