My AWS Glue job generates too many logs in Amazon CloudWatch. I want to reduce the number of logs generated.
Short description
AWS Glue Spark Extract/Transform/Load (ETL) jobs generate a lot of logs that you can use to monitor internal failures and diagnose jobs that fail. You can't control the number of logs that AWS Glue jobs generate on their instances, but you can adjust the verbosity of the logs.
To adjust log verbosity, use the following methods:
- Turn on the standard filter setting for continuous logging.
- Use the Spark context method setLogLevel.
- Use a custom log4j.properties file.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Turn on the standard filter setting for continuous logging
If you activated continuous logging for your job, then turn on the Standard filter for the Log filtering option.
To turn on this filter, use the AWS CLI to add the following job parameters:
'--enable-continuous-cloudwatch-log': 'true''--enable-continuous-log-filter': 'true'
Note: AWS Glue continuous logging is available only in AWS Glue 4.0 and earlier.
Use the Spark context method setLogLevel
You can use the setLogLevel method from pyspark.context.SparkContext to set the logging level for your job. Valid logging levels include ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, and WARN. For more information, see setLogLevel on the Apache Spark website.
To import the Spark context method and set the logging level, add the following code to your AWS Glue job:
from pyspark.context import SparkContext
sc = SparkContext()
sc.setLogLevel("new-log-level")
Note: Replace new-log-level with your new logging level. This code affects the driver log behavior but doesn't change the executor logs.
For more information, see Configuring logging on the Apache Spark website.
Use a custom log4j.properties file
AWS Glue 3.0 uses Log4j 1 for logging behavior, and you can customize these behaviors with the log4j.properties file. AWS Glue 4.0 uses Log4j 2 for the logging behavior, and you can customize these behaviors with the log4j2.properties file. For more information about Log4j 2, see Configuration properties on the Apache Logging Services website.
Note: If you apply a custom log4j.properties or log4j2.properties config file, then AWS Glue turns off continuous logging. Also, custom Log4j properties are available only in AWS Glue 4.0 and earlier.
You can include your logging preferences in the log4j2.properties file. Then, you can upload the file to Amazon Simple Storage Service (Amazon S3) and use the file in the AWS Glue job.
To use a custom config file in AWS Glue 4.0, complete the following steps:
-
Create a file named log4j2.properties to set the root logger level as error.
Example log4j2.properties file:
rootLogger.level = error
rootLogger.appenderRef.stdout.ref = STDOUT
appender.console.type = Console
appender.console.name = STDOUT
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %p [%t] %c{2} (%F:%M(%L)): %m%n
-
Upload the log4j2.properties file to Amazon S3 and copy the file's S3 URI.
-
In the AWS Glue job, add the following parameters:
--extra-files, s3://[objectpath]/log4j2.properties
Note: Replace s3://[objectpath]/log4j.properties with the S3 URI that you used in the preceding step.
-
Save the AWS Glue job and run it.
-
Check the related log stream in the /aws-glue/jobs/error log group.
Related information
Monitoring with Amazon CloudWatch