Glue crawler parsing emr-serverless spark stderr logs in S3

0

Trying to figure out if it's possible to use AWS Glue crawler to parse the spark stderr logs that are dumped from emr-serverless. The logs are space delimited. I tried running a crawler against the following and it said it couldn't determine any columns. Can any crawler experts weigh in on whether this can be parsed? Can I write my own crawler for this type of data?

24/01/25 17:35:27 INFO SparkContext: Running Spark version 3.4.1-amzn-2 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO: ResourceUtils: No custom resources configured for spark driver. 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO SparkContext: Submitted application: WordCount 24/01/25 17:35:27 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 24/01/25 17:35:27 DEBUG InternalLoggerFactor: Using SLF4J as the default logging framework 24/01/25 17:35:27 DEBUG PlatformDependent0: Java version: 8

ebethj
已提问 3 个月前160 查看次数
没有答案

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则