Skip to content

How do I reduce the number of logs that my AWS Glue job generates?

3 minute read
1

My AWS Glue job generates too many logs in Amazon CloudWatch. I want to reduce the number of logs generated.

Short description

AWS Glue Spark Extract/Transform/Load (ETL) jobs generate a lot of logs that you can use to monitor internal failures and diagnose jobs that fail. You can't control the number of logs that AWS Glue jobs generate on their instances, but you can adjust the verbosity of the logs.

To adjust log verbosity, use the following methods:

  • Turn on the standard filter setting for continuous logging.
  • Use the Spark context method setLogLevel.
  • Use a custom log4j.properties file.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

Turn on the standard filter setting for continuous logging

If you activated continuous logging for your job, then turn on the Standard filter for the Log filtering option.

To turn on this filter, use the AWS CLI to add the following job parameters:

'--enable-continuous-cloudwatch-log': 'true''--enable-continuous-log-filter': 'true'

Note: AWS Glue continuous logging is available only in AWS Glue 4.0 and earlier.

Use the Spark context method setLogLevel

You can use the setLogLevel method from pyspark.context.SparkContext to set the logging level for your job. Valid logging levels include ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, and WARN. For more information, see setLogLevel on the Apache Spark website.

To import the Spark context method and set the logging level, add the following code to your AWS Glue job:

from pyspark.context import SparkContext
sc = SparkContext()
sc.setLogLevel("new-log-level")

Note: Replace new-log-level with your new logging level. This code affects the driver log behavior but doesn't change the executor logs.

For more information, see Configuring logging on the Apache Spark website.

Use a custom log4j.properties file

AWS Glue 3.0 uses Log4j 1 for logging behavior, and you can customize these behaviors with the log4j.properties file. AWS Glue 4.0 uses Log4j 2 for the logging behavior, and you can customize these behaviors with the log4j2.properties file. For more information about Log4j 2, see Configuration properties on the Apache Logging Services website.

Note: If you apply a custom log4j.properties or log4j2.properties config file, then AWS Glue turns off continuous logging. Also, custom Log4j properties are available only in AWS Glue 4.0 and earlier.

You can include your logging preferences in the log4j2.properties file. Then, you can upload the file to Amazon Simple Storage Service (Amazon S3) and use the file in the AWS Glue job.

To use a custom config file in AWS Glue 4.0, complete the following steps:

  1. Create a file named log4j2.properties to set the root logger level as error.
    Example log4j2.properties file:

    rootLogger.level = error
    rootLogger.appenderRef.stdout.ref = STDOUT
    
    appender.console.type = Console
    appender.console.name = STDOUT
    appender.console.target = SYSTEM_ERR
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %p [%t] %c{2} (%F:%M(%L)): %m%n
  2. Upload the log4j2.properties file to Amazon S3 and copy the file's S3 URI.

  3. In the AWS Glue job, add the following parameters:

    --extra-files, s3://[objectpath]/log4j2.properties

    Note: Replace s3://[objectpath]/log4j.properties with the S3 URI that you used in the preceding step.

  4. Save the AWS Glue job and run it.

  5. Check the related log stream in the /aws-glue/jobs/error log group.

Related information

Monitoring with Amazon CloudWatch

AWS OFFICIALUpdated 6 months ago
3 Comments

Hi everyone, while migrating from AWS Glue 4.0 to Glue 5.0, we noticed the documentation states that "custom log4j properties are not supported in Glue 5.0." https://docs.aws.amazon.com/glue/latest/dg/migrating-version-50.html#migrating-version-50-from-40 Could you please confirm whether custom Log4j2 configurations are supported in Glue 5.0? If this is not supported, is there any recommended workaround or alternative to control or reduce log verbosity in Glue 5.0 jobs?

replied 8 months ago

I have the same problem as @Tom mentioned. Currently we've ceased the migration from Glue 4.0 to Glue 5.0 due to the lack of logging customization capabilities. In Glue 4.0, the log verbosity can be reduced with proper log4j2.properties. How to achieve this in Glue 5.0?

replied 7 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
replied 7 months ago