Spark application log format

0

I would like to know the log4j configuration to get container logs into more structured format like Json, so I can leverage another automation to parse the files and train some customization to filter precise content.

Scott M
gefragt vor 5 Monaten349 Aufrufe
2 Antworten
3
Akzeptierte Antwort

Hello,

For any platform that choose to run spark application, this might applicable,

  1. Set the log4j properties as per below, This is an example config, subject to change for any specific usecase.
# Set everything to be logged to the console
rootLogger.level = INFO
rootLogger.appenderRef.stdout.ref = stderr

appenders = console
appender.console.type = Console
appender.console.name  = stderr
appender.console.target  = System.err
appender.console.json.type = JsonTemplateLayout
appender.console.json.eventTemplateUri = classpath:LogstashJsonEventLayoutV1.json

logger.spark.name  = org.apache.spark
logger.spark.level = INFO
logger.spark.additivity = false
logger.spark.appenderRef.stdout.ref = stderr
  1. Download the log4j layout for Json from maven respository from the below link. Please note that spark 3.3.1 or above supports log4j with Json format.

https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-layout-template-json

  1. Place the jar files in the spark jar location or add to classpath or add the jar into the spark-submit command. Restart the spark service if required and submit the job. For an example below,
spark-submit --deploy-mode cluster --master yarn --class org.apache.spark.examples.SparkPi /usr/lib/spark/examples/jars/spark-examples.jar 1000

Sample log file:

18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Registering signal handler for INT","thread_name":"main","@timestamp":"2023-12-18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing view acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.806+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing modify acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.807+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
...
AWS
SUPPORT-TECHNIKER
beantwortet vor 5 Monaten
profile picture
EXPERTE
überprüft vor 5 Monaten
2

Hi,

If you are referring to AWS Glue you can use this guide to provide a custom log4.properties file where you can customize the output format.

https://repost.aws/knowledge-center/glue-reduce-cloudwatch-logs

Bests.

profile pictureAWS
beantwortet vor 5 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen