Hello,
We have been running workloads on ECS using EC2 and capacity providers as the backing. All of the tasks have been provided the same amount of CPU and memory. All of the tasks have been run on the same type of EC2 instance type that have enough space for these tasks. The tasks are run with enough memory and cpu that other tasks shouldn't be placed on these same EC2 instances. There are a few CPUs and a few GBs of memory left so we aren't completely maxing out the EC2 instances resources.
Occasionally, we get an error (example below) when trying to start a task. It has only happened a handful of times out of hundreds of tasks that have been run. Based on the example in this documentation that shows "RESOURCE:CPU" it seems like the EC2 instance that the task is being placed on doesn't have enough CPU. But given our current setup of one task per machine how would that be possible? Does anyone have ideas as to what might be going on or things we could change on our end to fix or mitigate this?
Example error:
[{"Arn":"arn:aws:ecs:REGION:ACCOUNT-ID:container-instance/CONTAINER-INSTANCE-ID","Reason":"RESOURCE:CPU"}] (Service: AmazonECS; Status Code: 400; Error Code: AmazonECS.Unknown; Request ID: UUID; Proxy: null)
Thank you for the response. We'll make a support ticket outlining this issue to AWS Premium Support.