Glue crawler parsing emr-serverless spark stderr logs in S3

0

Trying to figure out if it's possible to use AWS Glue crawler to parse the spark stderr logs that are dumped from emr-serverless. The logs are space delimited. I tried running a crawler against the following and it said it couldn't determine any columns. Can any crawler experts weigh in on whether this can be parsed? Can I write my own crawler for this type of data?

24/01/25 17:35:27 INFO SparkContext: Running Spark version 3.4.1-amzn-2 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO: ResourceUtils: No custom resources configured for spark driver. 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO SparkContext: Submitted application: WordCount 24/01/25 17:35:27 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 24/01/25 17:35:27 DEBUG InternalLoggerFactor: Using SLF4J as the default logging framework 24/01/25 17:35:27 DEBUG PlatformDependent0: Java version: 8

ebethj
gefragt vor 3 Monaten160 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen