- Newest
- Most votes
- Most comments
Unfortunately, using custom metrics with the start_evaluation_job API in Amazon Bedrock for automated evaluation jobs is not directly supported in the way you're describing. The start_evaluation_job API in Bedrock is designed to work with a set of predefined built-in metrics for automated model evaluations.
For automated model evaluation jobs in Bedrock, you are limited to using the built-in metrics such as "Builtin.Accuracy", "Builtin.Robustness", and "Builtin.Toxicity". There isn't a direct way to use custom metrics within the Bedrock start_evaluation_job API itself.
If you need to use custom metrics for your RAG (Retrieval-Augmented Generation) evaluations, you may need to consider alternative approaches:
-
Use SageMaker: As you mentioned, one option is to use Amazon SageMaker for your custom metric evaluations. SageMaker provides more flexibility in defining and using custom metrics. You could potentially use SageMaker to run your custom evaluations and then incorporate those results into your broader workflow that includes Bedrock.
-
Post-processing: Another approach could be to use the built-in metrics available in Bedrock's start_evaluation_job, and then perform additional custom metric calculations on the results after the evaluation job completes. This would involve running your evaluation job with the available built-in metrics, then applying your custom metrics to the output data.
-
Custom implementation: You could implement your own evaluation pipeline that uses the Bedrock runtime for inference, but handles the metric calculations separately. This would give you full control over the metrics used, but would require more development effort.
It's important to note that while Bedrock provides powerful built-in evaluation capabilities, it may not cover all custom evaluation needs, especially for specialized use cases like RAG evaluations. In such cases, a hybrid approach using multiple AWS services or custom implementations might be necessary to achieve your specific evaluation requirements.
Sources
EvaluationDatasetMetricConfig - Amazon Bedrock
CustomizedMetricSpecification - Amazon SageMaker
Define Training Metrics - Amazon SageMaker AI
Relevant content
- asked a year ago
