Mandatory glue job logs that get reported to Cloudwatch

0

Issue:

Our aim is to reduce logging to control data ingestion by 'PutLogEvent' metrics of CW. In past when we ran our glue job against a 35GB data size, we got a billed ~2K for cloudwatch most of which came by 'PutLogEvent'.

Action Taken:

We have a glue pyspark job using glue script editor for which we have disabled 'Job Metrics', 'Job observability metrics' and 'continuous logging' options. Our understanding was, this won't create any log groups in CW. But it turned out that it still creates '/aws-glue/jobs/error' and '/aws-glue/jobs/output' metrics in CW.

We have also defined custom log4j2 file based on recommendation from AWS, to report logs on ERROR level, but we still see INFO level logs for driver and executors.

Help/pointers needed:

Is creation of '/aws-glue/jobs/error' and '/aws-glue/jobs/output' metrics to report system/infra logs un-avoidable for Glue job?

Is there any way we can change the verbose to only log ERROR? or any other option to reduce log size? We are unsure about re-running the job against 35GB of data given the CW bill.

Any help is appreciated, TIA!

1 Answer
1

Hello,

I located this blog post that may help address your concerns: New Amazon CloudWatch log class to cost-effectively scale your AWS Glue workloads.

- Brian D.

profile pictureAWS
EXPERT
answered 8 months ago
profile picture
EXPERT
reviewed 8 months ago
profile picture
EXPERT
reviewed 8 months ago
  • Hi Brian, I've followed the document you shared & changed log group of my existing glue job by passing mentioned job parameter. But I see that now logs get generated in both /aws-glue/jobs/error & /aws-glue/jobs/logs-v2-infrequent-access log groups. Is this expected? Also, for 'logs-v2-infrequent-access' to work, we need to enable continuous logging option from glue job, which we are hesitant to do given the cost and how we cannot change the verbose of system logs. Is there any other way by which we can log only ERROR level? Is cloudwatch custom metric an option here? Does it ingest all logs first then apply filter?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions