AWS Batch: Fargate container stuck

0

Hello,

I have an AWS batch job running on fargate monitored with cloudwatch container insights that gets stuck for more than 12 hours.

Locally on my laptop's i9 the exact same job only takes 1h. My job is configured to use 4 vcpus and 30g of RAM.

Container insights is not showing the container using anywhere near the max resource usage. It peaks at around 2g of RAM and then gets stuck.

I'm trying to troubleshoot if this is specific to AWS batch or maybe a combination of my workload plus batch.

Are there any other things I could do to troubleshoot this?

Thanks.

preguntada hace 2 años328 visualizaciones
1 Respuesta
1

Use cloudwatch to log the process and identify the step that cause the performance degradation.

Also you can use ECS Exec:

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html

To check what is going on during that 12 hours on your Fargate container.

Your process use any hardware acceleration feature like GPU?

respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas