Issue:
Our aim is to reduce logging to control data ingestion by 'PutLogEvent' metrics of CW. In past when we ran our glue job against a 35GB data size, we got a billed ~2K for cloudwatch most of which came by 'PutLogEvent'.
Action Taken:
We have a glue pyspark job using glue script editor for which we have disabled 'Job Metrics', 'Job observability metrics' and 'continuous logging' options. Our understanding was, this won't create any log groups in CW. But it turned out that it still creates '/aws-glue/jobs/error' and '/aws-glue/jobs/output' metrics in CW.
We have also defined custom log4j2 file based on recommendation from AWS, to report logs on ERROR level, but we still see INFO level logs for driver and executors.
Help/pointers needed:
Is creation of '/aws-glue/jobs/error' and '/aws-glue/jobs/output' metrics to report system/infra logs un-avoidable for Glue job?
Is there any way we can change the verbose to only log ERROR? or any other option to reduce log size? We are unsure about re-running the job against 35GB of data given the CW bill.
Any help is appreciated, TIA!
Hi Brian, I've followed the document you shared & changed log group of my existing glue job by passing mentioned job parameter. But I see that now logs get generated in both /aws-glue/jobs/error & /aws-glue/jobs/logs-v2-infrequent-access log groups. Is this expected? Also, for 'logs-v2-infrequent-access' to work, we need to enable continuous logging option from glue job, which we are hesitant to do given the cost and how we cannot change the verbose of system logs. Is there any other way by which we can log only ERROR level? Is cloudwatch custom metric an option here? Does it ingest all logs first then apply filter?