Spark application log format

0

I would like to know the log4j configuration to get container logs into more structured format like Json, so I can leverage another automation to parse the files and train some customization to filter precise content.

Scott M
질문됨 5달 전349회 조회
2개 답변
3
수락된 답변

Hello,

For any platform that choose to run spark application, this might applicable,

  1. Set the log4j properties as per below, This is an example config, subject to change for any specific usecase.
# Set everything to be logged to the console
rootLogger.level = INFO
rootLogger.appenderRef.stdout.ref = stderr

appenders = console
appender.console.type = Console
appender.console.name  = stderr
appender.console.target  = System.err
appender.console.json.type = JsonTemplateLayout
appender.console.json.eventTemplateUri = classpath:LogstashJsonEventLayoutV1.json

logger.spark.name  = org.apache.spark
logger.spark.level = INFO
logger.spark.additivity = false
logger.spark.appenderRef.stdout.ref = stderr
  1. Download the log4j layout for Json from maven respository from the below link. Please note that spark 3.3.1 or above supports log4j with Json format.

https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-layout-template-json

  1. Place the jar files in the spark jar location or add to classpath or add the jar into the spark-submit command. Restart the spark service if required and submit the job. For an example below,
spark-submit --deploy-mode cluster --master yarn --class org.apache.spark.examples.SparkPi /usr/lib/spark/examples/jars/spark-examples.jar 1000

Sample log file:

18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Registering signal handler for INT","thread_name":"main","@timestamp":"2023-12-18T13:34:23.362+0000","level":"INFO","logger_name":"org.apache.spark.util.SignalUtils"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing view acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.806+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
{"@version":1,"source_host":"ip-172-31-13-134","message":"Changing modify acls to: yarn,hadoop","thread_name":"main","@timestamp":"2023-12-18T13:34:23.807+0000","level":"INFO","logger_name":"org.apache.spark.SecurityManager"}
...
AWS
지원 엔지니어
답변함 5달 전
profile picture
전문가
검토됨 5달 전
2

Hi,

If you are referring to AWS Glue you can use this guide to provide a custom log4.properties file where you can customize the output format.

https://repost.aws/knowledge-center/glue-reduce-cloudwatch-logs

Bests.

profile pictureAWS
답변함 5달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠