AWS Batch: Fargate container stuck

0

Hello,

I have an AWS batch job running on fargate monitored with cloudwatch container insights that gets stuck for more than 12 hours.

Locally on my laptop's i9 the exact same job only takes 1h. My job is configured to use 4 vcpus and 30g of RAM.

Container insights is not showing the container using anywhere near the max resource usage. It peaks at around 2g of RAM and then gets stuck.

I'm trying to troubleshoot if this is specific to AWS batch or maybe a combination of my workload plus batch.

Are there any other things I could do to troubleshoot this?

Thanks.

feita há 2 anos328 visualizações
1 Resposta
1

Use cloudwatch to log the process and identify the step that cause the performance degradation.

Also you can use ECS Exec:

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html

To check what is going on during that 12 hours on your Fargate container.

Your process use any hardware acceleration feature like GPU?

respondido há 2 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas