AWS Batch: Fargate container stuck

0

Hello,

I have an AWS batch job running on fargate monitored with cloudwatch container insights that gets stuck for more than 12 hours.

Locally on my laptop's i9 the exact same job only takes 1h. My job is configured to use 4 vcpus and 30g of RAM.

Container insights is not showing the container using anywhere near the max resource usage. It peaks at around 2g of RAM and then gets stuck.

I'm trying to troubleshoot if this is specific to AWS batch or maybe a combination of my workload plus batch.

Are there any other things I could do to troubleshoot this?

Thanks.

demandé il y a 2 ans319 vues
1 réponse
1

Use cloudwatch to log the process and identify the step that cause the performance degradation.

Also you can use ECS Exec:

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html

To check what is going on during that 12 hours on your Fargate container.

Your process use any hardware acceleration feature like GPU?

répondu il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions