Do ml.inf machines support multi-model endpoints?

0

We have been trying to deploy our multiple models to a multi-model endpoint that uses inference machines (inf.xlarge) without luck. ClientError: An error occurred (ValidationException) when calling the CreateEndpointConfig operation: MultiModel mode is not supported for instance type ml.inf1.xlarge.

This isn't good, is that really the case, or have we messed up somewhere during the process?

Thanks

1개 답변
1

Unfortunately no, I believe it's not currently supported and the error message you saw is in line with that.

I'd like to see the wording on this page (which says "Multi-model endpoints are not supported on GPU instance types.") expanded to make this clearer since Inferentia accelerators aren't "GPUs" as such.

You could perhaps look at testing CPU inference performance for MME serving of a large number of models, or push some of your higher-traffic models to dedicated single-model endpoints on Inferentia?

AWS
전문가
Alex_T
답변함 2년 전
  • What a shame, we handle many concurrent requests per second, and inference machines were the best ones we found... Is there any machine that can withstand a similar workload without costing us a fortune?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠