Spark application log format

0

I would like to know the log4j configuration to get container logs into more structured format like Json, so I can leverage another automation to parse the files and train some customization to filter precise content.

Scott M
已提问 5 个月前349 查看次数
2 回答
3
已接受的回答

Hello,

For any platform that choose to run spark application, this might applicable,

  1. Set the log4j properties as per below, This is an example config, subject to change for any specific usecase.
# Set everything to be logged to the console
rootLogger.level = INFO
rootLogger.appenderRef.stdout.ref = stderr

appenders = console
appender.console.type = Console
appender.console.name  = stderr
appender.console.target  = System.err
appender.console.json.type = JsonTemplateLayout
appender.console.json.eventTemplateUri = classpath:LogstashJsonEventLayoutV1.json

logger.spark.name  = org.apache.spark
logger.spark.level = INFO
logger.spark.additivity = false
logger.spark.appenderRef.stdout.ref = stderr
  1. Download the log4j layout for Json from maven respository from the below link. Please note that spark 3.3.1 or above supports log4j with Json format.

https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-layout-template-json

  1. Place the jar files in the spark jar location or add to classpath or add the jar into the spark-submit command. Restart the spark service if required and submit the job. For an example below,
spark-submit --deploy-mode cluster --master yarn --class org.apache.spark.examples.SparkPi /usr/lib/spark/examples/jars/spark-examples.jar 1000

Sample log file:

18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Registering signal handler for INT","thread_name":"main","@timestamp":"2023-12-18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing view acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.806+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing modify acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.807+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
...
AWS
支持工程师
已回答 5 个月前
profile picture
专家
已审核 5 个月前
2

Hi,

If you are referring to AWS Glue you can use this guide to provide a custom log4.properties file where you can customize the output format.

https://repost.aws/knowledge-center/glue-reduce-cloudwatch-logs

Bests.

profile pictureAWS
已回答 5 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则