Glue crawler parsing emr-serverless spark stderr logs in S3

0

Trying to figure out if it's possible to use AWS Glue crawler to parse the spark stderr logs that are dumped from emr-serverless. The logs are space delimited. I tried running a crawler against the following and it said it couldn't determine any columns. Can any crawler experts weigh in on whether this can be parsed? Can I write my own crawler for this type of data?

24/01/25 17:35:27 INFO SparkContext: Running Spark version 3.4.1-amzn-2 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO: ResourceUtils: No custom resources configured for spark driver. 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO SparkContext: Submitted application: WordCount 24/01/25 17:35:27 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 24/01/25 17:35:27 DEBUG InternalLoggerFactor: Using SLF4J as the default logging framework 24/01/25 17:35:27 DEBUG PlatformDependent0: Java version: 8

ebethj
已提問 3 個月前檢視次數 160 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南