Glue crawler parsing emr-serverless spark stderr logs in S3
0
Trying to figure out if it's possible to use AWS Glue crawler to parse the spark stderr logs that are dumped from emr-serverless.
The logs are space delimited. I tried running a crawler against the following and it said it couldn't determine any columns. Can any crawler experts weigh in on whether this can be parsed? Can I write my own crawler for this type of data?
24/01/25 17:35:27 INFO SparkContext: Running Spark version 3.4.1-amzn-2
24/01/25 17:35:27 INFO ResourceUtils: =============================================
24/01/25 17:35:27 INFO: ResourceUtils: No custom resources configured for spark driver.
24/01/25 17:35:27 INFO ResourceUtils: =============================================
24/01/25 17:35:27 INFO SparkContext: Submitted application: WordCount
24/01/25 17:35:27 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor
24/01/25 17:35:27 DEBUG InternalLoggerFactor: Using SLF4J as the default logging framework
24/01/25 17:35:27 DEBUG PlatformDependent0: Java version: 8