2 réponses
- Le plus récent
- Le plus de votes
- La plupart des commentaires
1
The problem was the underlying EC2 instance. It stopped operating normally for some reason. After terminating it and starting a new instance for the batch cluster the jobs started running again and the log stream were being created and populated with events.
répondu il y a 2 mois
0
It sounds like the issue is related to throttling errors when creating CloudWatch log streams from AWS Batch jobs. Can you check on these ?
- Verify that the IAM role used by the AWS Batch job has permissions to create log streams in CloudWatch Logs.
- Check if you are hitting the CloudWatch Logs throttling limits. Each AWS account has a limit on the number of log events it can ingest per second across all log groups. Batch jobs creating logs concurrently could exceed this limit.
- Try increasing the logging retries configuration in the AWS Batch job definition. This will make the job retry creating log streams on throttling errors before failing.
"containerProperties": {
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/aws/batch/job",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "job",
"awslogs-create-group": "true",
"awslogs-retries": "5"
}
}
}
Contenus pertinents
- demandé il y a 6 mois
- demandé il y a un an
- demandé il y a 2 mois
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a 7 mois
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
Is not a permission error because very few jobs do manage to start successfully.
How can I see if I'm hitting the CloudWatch Logs throttling limits? Also, afaik throttling errors have a specific error message and not
operation error
—something likeThrottlingException: Rate exceeded status code: 400
.It looks like you can't set retries for awslogs log driver:
Log driver awslogs disallows options: awslogs-retries