Do ml.inf machines support multi-model endpoints?

0

We have been trying to deploy our multiple models to a multi-model endpoint that uses inference machines (inf.xlarge) without luck. ClientError: An error occurred (ValidationException) when calling the CreateEndpointConfig operation: MultiModel mode is not supported for instance type ml.inf1.xlarge.

This isn't good, is that really the case, or have we messed up somewhere during the process?

Thanks

已提問 2 年前檢視次數 515 次
1 個回答
1

Unfortunately no, I believe it's not currently supported and the error message you saw is in line with that.

I'd like to see the wording on this page (which says "Multi-model endpoints are not supported on GPU instance types.") expanded to make this clearer since Inferentia accelerators aren't "GPUs" as such.

You could perhaps look at testing CPU inference performance for MME serving of a large number of models, or push some of your higher-traffic models to dedicated single-model endpoints on Inferentia?

AWS
專家
Alex_T
已回答 2 年前
  • What a shame, we handle many concurrent requests per second, and inference machines were the best ones we found... Is there any machine that can withstand a similar workload without costing us a fortune?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南