Metric log error - ThroughputMetricsSource: Metric: is already registered by a different accumulator

0

Hi, When running the same job concurrently, I see the below error in the logs, is there a way to resolve this error?

ThroughputMetricsSource: Metric: s3://<my-bucket>/orchestration/logs.recordsWritten is already registered by a different accumulator. Retrying with suffix #1 java.lang.IllegalArgumentException: A metric named s3://<my-bucket>/orchestration/logs.recordsWritten already exists.

AWS
質問済み 2ヶ月前254ビュー
1回答
1

In AWS Glue, this error can occur when multiple Glue jobs are running concurrently and attempting to register the same metric name for their Spark accumulators. Accumulators in Spark are variables that are used to aggregate information across tasks. In the context of AWS Glue, which is built on top of Apache Spark, these accumulators might be used for metrics like tracking the number of records written to an S3 bucket.

The error message you're seeing:

ThroughputMetricsSource: Metric: s3://<my-bucket>/orchestration/logs.recordsWritten is already registered by a different accumulator. Retrying with suffix #1 java.lang.IllegalArgumentException: A metric named s3://<my-bucket>/orchestration/logs.recordsWritten already exists.

indicates that the metric named s3://<my-bucket>/orchestration/logs.recordsWritten is being registered more than once, which is not allowed. This can happen when multiple Glue jobs are using the same metric name simultaneously.

To resolve this issue, you need to ensure that each Glue job uses a unique name for its metrics.

profile picture
エキスパート
回答済み 2ヶ月前
  • Thanks for your response, is there a way to resolve this using Glue Studio visual instead of scripting? All my job properties including concurrency is set in job details tab and the job itself is called from Step Functions

  • I think you can resolve the issue by modifying your Glue jobs to write output to unique S3 paths for each concurrent run, by incorporating dynamic elements like job run IDs or timestamps into the S3 output paths. But, using Glue Studio's visual interface alone, you cannot directly configure dynamic elements like job run IDs or timestamps into the S3 output paths. This functionality would typically require scripting or passing dynamic parameters to your job.

    In Glue Studio, you can set job parameters and use them in your job script, but the generation of dynamic elements like timestamps would need to be handled within the script itself, rather than through the visual interface.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ