Spark application log format

0

I would like to know the log4j configuration to get container logs into more structured format like Json, so I can leverage another automation to parse the files and train some customization to filter precise content.

Scott M
demandé il y a 5 mois349 vues
2 réponses
3
Réponse acceptée

Hello,

For any platform that choose to run spark application, this might applicable,

  1. Set the log4j properties as per below, This is an example config, subject to change for any specific usecase.
# Set everything to be logged to the console
rootLogger.level = INFO
rootLogger.appenderRef.stdout.ref = stderr

appenders = console
appender.console.type = Console
appender.console.name  = stderr
appender.console.target  = System.err
appender.console.json.type = JsonTemplateLayout
appender.console.json.eventTemplateUri = classpath:LogstashJsonEventLayoutV1.json

logger.spark.name  = org.apache.spark
logger.spark.level = INFO
logger.spark.additivity = false
logger.spark.appenderRef.stdout.ref = stderr
  1. Download the log4j layout for Json from maven respository from the below link. Please note that spark 3.3.1 or above supports log4j with Json format.

https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-layout-template-json

  1. Place the jar files in the spark jar location or add to classpath or add the jar into the spark-submit command. Restart the spark service if required and submit the job. For an example below,
spark-submit --deploy-mode cluster --master yarn --class org.apache.spark.examples.SparkPi /usr/lib/spark/examples/jars/spark-examples.jar 1000

Sample log file:

18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Registering signal handler for INT","thread_name":"main","@timestamp":"2023-12-18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing view acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.806+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing modify acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.807+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
...
AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
répondu il y a 5 mois
profile picture
EXPERT
vérifié il y a 5 mois
2

Hi,

If you are referring to AWS Glue you can use this guide to provide a custom log4.properties file where you can customize the output format.

https://repost.aws/knowledge-center/glue-reduce-cloudwatch-logs

Bests.

profile pictureAWS
répondu il y a 5 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions