What is the most cost efficient set-up for a GPU-enabled EC2?

0

Hello,

I'd like advice on what AWS service is most cost-efficient for the need outlined below?

We're deploying a Dockerised application using EC2, ECS (Elastic Container Service), and ECR (Elastic Container Registry). We'll use a load balancer and Route 53 for domain name registration. But we expect the main cost to be the EC2, which will need to be GPU-enabled to facilitate training distilroberta-base as part of an auto-ML solution.

distilroberta-base: 6-layer, 768-hidden, 12-heads, 82M parameters The DistilRoBERTa model distilled from the RoBERTa model roberta-base checkpoint.

This service will need to be available at all points, but is unlikely to be called more than a few times a week. The training datasets will not be especially large––typically on the order of 100k strings with output metrics.

Can anyone advise how this may done in the most cost-efficient way?

Thanks!

1 Answer
0

This service will need to be available at all points, but is unlikely to be called more than a few times a week.

I take it this means that the application needs to be available at all times (needs at least one instance running) and able to scale on the times that it is called within the week.

For this, Capacity Providers with a mix of Spot and on-demand instances would be ideal for your infrastructure:
https://aws.amazon.com/blogs/containers/optimize-cost-for-container-workloads-with-ecs-capacity-providers-and-ec2-spot-instances/

A capacity provider is linked to an Auto Scaling group, hence the Auto Scaling group can be configured to only utilize instances which contain GPUs.

If the application needs to scale (i.e. you would need more than one ECS task when the application is called), you can look at implementing Service Auto Scaling:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-auto-scaling.html

This would allow your ECS service to scale as needed (i.e. launch and stop tasks based on demand). The Capacity Provider would launch new instances (if required) as the services scales up, and terminate any instances not in use as the service scales down.

AWS
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions