Skip to content

How to make use of custom metrics inside of start_evaluation_job.

0

I dont want a Human-based Evaluation Jobs but an Automated Evaluation Jobs where Instead of using builtIn metrics, I am able to use custom metrics to test my RAG evaluations on be it on Custom Inference or Using a knowledge base. Right now I am able to use the API but not able to use by custom metrics. Is it true that the Only way is to use Sagemaker and some how link these custom metrics I make in sagemaker to this start_evaluation_job or is there another approach to this. I want to be using bedock-runtime.start_evaluation_job.

1 Answer
1
Accepted Answer

Unfortunately, using custom metrics with the start_evaluation_job API in Amazon Bedrock for automated evaluation jobs is not directly supported in the way you're describing. The start_evaluation_job API in Bedrock is designed to work with a set of predefined built-in metrics for automated model evaluations.

For automated model evaluation jobs in Bedrock, you are limited to using the built-in metrics such as "Builtin.Accuracy", "Builtin.Robustness", and "Builtin.Toxicity". There isn't a direct way to use custom metrics within the Bedrock start_evaluation_job API itself.

If you need to use custom metrics for your RAG (Retrieval-Augmented Generation) evaluations, you may need to consider alternative approaches:

  1. Use SageMaker: As you mentioned, one option is to use Amazon SageMaker for your custom metric evaluations. SageMaker provides more flexibility in defining and using custom metrics. You could potentially use SageMaker to run your custom evaluations and then incorporate those results into your broader workflow that includes Bedrock.

  2. Post-processing: Another approach could be to use the built-in metrics available in Bedrock's start_evaluation_job, and then perform additional custom metric calculations on the results after the evaluation job completes. This would involve running your evaluation job with the available built-in metrics, then applying your custom metrics to the output data.

  3. Custom implementation: You could implement your own evaluation pipeline that uses the Bedrock runtime for inference, but handles the metric calculations separately. This would give you full control over the metrics used, but would require more development effort.

It's important to note that while Bedrock provides powerful built-in evaluation capabilities, it may not cover all custom evaluation needs, especially for specialized use cases like RAG evaluations. In such cases, a hybrid approach using multiple AWS services or custom implementations might be necessary to achieve your specific evaluation requirements.

Sources
EvaluationDatasetMetricConfig - Amazon Bedrock
CustomizedMetricSpecification - Amazon SageMaker
Define Training Metrics - Amazon SageMaker AI

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.