AWS Batch: Fargate container stuck

0

Hello,

I have an AWS batch job running on fargate monitored with cloudwatch container insights that gets stuck for more than 12 hours.

Locally on my laptop's i9 the exact same job only takes 1h. My job is configured to use 4 vcpus and 30g of RAM.

Container insights is not showing the container using anywhere near the max resource usage. It peaks at around 2g of RAM and then gets stuck.

I'm trying to troubleshoot if this is specific to AWS batch or maybe a combination of my workload plus batch.

Are there any other things I could do to troubleshoot this?

Thanks.

posta 2 anni fa328 visualizzazioni
1 Risposta
1

Use cloudwatch to log the process and identify the step that cause the performance degradation.

Also you can use ECS Exec:

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html

To check what is going on during that 12 hours on your Fargate container.

Your process use any hardware acceleration feature like GPU?

con risposta 2 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande