Metric log error - ThroughputMetricsSource: Metric: is already registered by a different accumulator

0

Hi, When running the same job concurrently, I see the below error in the logs, is there a way to resolve this error?

ThroughputMetricsSource: Metric: s3://<my-bucket>/orchestration/logs.recordsWritten is already registered by a different accumulator. Retrying with suffix #1 java.lang.IllegalArgumentException: A metric named s3://<my-bucket>/orchestration/logs.recordsWritten already exists.

AWS
已提問 2 個月前檢視次數 249 次
1 個回答
1

In AWS Glue, this error can occur when multiple Glue jobs are running concurrently and attempting to register the same metric name for their Spark accumulators. Accumulators in Spark are variables that are used to aggregate information across tasks. In the context of AWS Glue, which is built on top of Apache Spark, these accumulators might be used for metrics like tracking the number of records written to an S3 bucket.

The error message you're seeing:

ThroughputMetricsSource: Metric: s3://<my-bucket>/orchestration/logs.recordsWritten is already registered by a different accumulator. Retrying with suffix #1 java.lang.IllegalArgumentException: A metric named s3://<my-bucket>/orchestration/logs.recordsWritten already exists.

indicates that the metric named s3://<my-bucket>/orchestration/logs.recordsWritten is being registered more than once, which is not allowed. This can happen when multiple Glue jobs are using the same metric name simultaneously.

To resolve this issue, you need to ensure that each Glue job uses a unique name for its metrics.

profile picture
專家
已回答 2 個月前
  • Thanks for your response, is there a way to resolve this using Glue Studio visual instead of scripting? All my job properties including concurrency is set in job details tab and the job itself is called from Step Functions

  • I think you can resolve the issue by modifying your Glue jobs to write output to unique S3 paths for each concurrent run, by incorporating dynamic elements like job run IDs or timestamps into the S3 output paths. But, using Glue Studio's visual interface alone, you cannot directly configure dynamic elements like job run IDs or timestamps into the S3 output paths. This functionality would typically require scripting or passing dynamic parameters to your job.

    In Glue Studio, you can set job parameters and use them in your job script, but the generation of dynamic elements like timestamps would need to be handled within the script itself, rather than through the visual interface.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南