What is the most cost efficient set-up for a GPU-enabled EC2?

0

Hello,

I'd like advice on what AWS service is most cost-efficient for the need outlined below?

We're deploying a Dockerised application using EC2, ECS (Elastic Container Service), and ECR (Elastic Container Registry). We'll use a load balancer and Route 53 for domain name registration. But we expect the main cost to be the EC2, which will need to be GPU-enabled to facilitate training distilroberta-base as part of an auto-ML solution.

distilroberta-base: 6-layer, 768-hidden, 12-heads, 82M parameters The DistilRoBERTa model distilled from the RoBERTa model roberta-base checkpoint.

This service will need to be available at all points, but is unlikely to be called more than a few times a week. The training datasets will not be especially large––typically on the order of 100k strings with output metrics.

Can anyone advise how this may done in the most cost-efficient way?

Thanks!

1개 답변
0

This service will need to be available at all points, but is unlikely to be called more than a few times a week.

I take it this means that the application needs to be available at all times (needs at least one instance running) and able to scale on the times that it is called within the week.

For this, Capacity Providers with a mix of Spot and on-demand instances would be ideal for your infrastructure:
https://aws.amazon.com/blogs/containers/optimize-cost-for-container-workloads-with-ecs-capacity-providers-and-ec2-spot-instances/

A capacity provider is linked to an Auto Scaling group, hence the Auto Scaling group can be configured to only utilize instances which contain GPUs.

If the application needs to scale (i.e. you would need more than one ECS task when the application is called), you can look at implementing Service Auto Scaling:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-auto-scaling.html

This would allow your ECS service to scale as needed (i.e. launch and stop tasks based on demand). The Capacity Provider would launch new instances (if required) as the services scales up, and terminate any instances not in use as the service scales down.

AWS
답변함 8달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠