Glue crawler parsing emr-serverless spark stderr logs in S3

0

Trying to figure out if it's possible to use AWS Glue crawler to parse the spark stderr logs that are dumped from emr-serverless. The logs are space delimited. I tried running a crawler against the following and it said it couldn't determine any columns. Can any crawler experts weigh in on whether this can be parsed? Can I write my own crawler for this type of data?

24/01/25 17:35:27 INFO SparkContext: Running Spark version 3.4.1-amzn-2 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO: ResourceUtils: No custom resources configured for spark driver. 24/01/25 17:35:27 INFO ResourceUtils: ============================================= 24/01/25 17:35:27 INFO SparkContext: Submitted application: WordCount 24/01/25 17:35:27 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor 24/01/25 17:35:27 DEBUG InternalLoggerFactor: Using SLF4J as the default logging framework 24/01/25 17:35:27 DEBUG PlatformDependent0: Java version: 8

ebethj
demandé il y a 3 mois160 vues
Aucune réponse

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions