Spark application log format

0

I would like to know the log4j configuration to get container logs into more structured format like Json, so I can leverage another automation to parse the files and train some customization to filter precise content.

Scott M
asked 4 months ago316 views
2 Answers
3
Accepted Answer

Hello,

For any platform that choose to run spark application, this might applicable,

  1. Set the log4j properties as per below, This is an example config, subject to change for any specific usecase.
# Set everything to be logged to the console
rootLogger.level = INFO
rootLogger.appenderRef.stdout.ref = stderr

appenders = console
appender.console.type = Console
appender.console.name  = stderr
appender.console.target  = System.err
appender.console.json.type = JsonTemplateLayout
appender.console.json.eventTemplateUri = classpath:LogstashJsonEventLayoutV1.json

logger.spark.name  = org.apache.spark
logger.spark.level = INFO
logger.spark.additivity = false
logger.spark.appenderRef.stdout.ref = stderr
  1. Download the log4j layout for Json from maven respository from the below link. Please note that spark 3.3.1 or above supports log4j with Json format.

https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-layout-template-json

  1. Place the jar files in the spark jar location or add to classpath or add the jar into the spark-submit command. Restart the spark service if required and submit the job. For an example below,
spark-submit --deploy-mode cluster --master yarn --class org.apache.spark.examples.SparkPi /usr/lib/spark/examples/jars/spark-examples.jar 1000

Sample log file:

18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Registering signal handler for INT","thread_name":"main","@timestamp":"2023-12-18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing view acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.806+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing modify acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.807+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
...
AWS
SUPPORT ENGINEER
answered 4 months ago
profile picture
EXPERT
reviewed 4 months ago
2

Hi,

If you are referring to AWS Glue you can use this guide to provide a custom log4.properties file where you can customize the output format.

https://repost.aws/knowledge-center/glue-reduce-cloudwatch-logs

Bests.

profile pictureAWS
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions