Queue priorities with Spot fleets
According to documentation: https://docs.aws.amazon.com/gamelift/latest/developerguide/queues-best-practices.html#queues-design-spot
- When configuring priorities for a queue with Spot fleets, place cost near the top of the list. This will ensure that locations on Spot fleets will always take precedence over locations on On-Demand fleets, when available.
Prioritize the fleets in your queue. Fleet prioritization determines where the queue looks first when searching for available resources to host a new game session. You might choose to prioritize by Region, instance types, fleet type, and so on. When working with Spot fleets, we recommend either of the following approaches:
- If your infrastructure uses a primary Region with fleets in a second Region for back-up only, you want to prioritize fleets first by region, and then by fleet type. With this approach, all fleets in the primary Region are placed at the top of the list, with Spot fleets followed by On-Demand fleets.
- If your infrastructure uses multiple Regions equally, you want to prioritize fleets by fleet type, placing Spot fleets at the top of the list.
I have a setup with 10 fleets (5 per region): 1 x On-Demand - c5.xlarge 4 x Spot - c5.xlarge, c5.2xlarge, r5.xlarge, m5.xlarge
I've placed the On-Demand fleets last in the Queue and still the players are routed to those two On-Demand fleets instead of Spot ones. We are making the placements based on reported player latency.
What am I missing? What is the explanation for this behavior? Thank you.
GameLift Queue would prefer SPOT fleet as long as it is "viable". Viability is determined by the SPOT fleet's EC2 Instance Type, OS and Region. If those attribution combination is at risk for SPOT interruption, we deem it as unviable.
SPOT interruption, in layman's term, is EC2 reclaiming the SPOT fleet and reallocate it to ON_DEMAND. This typically happen when the EC2 instance type and OS have high usage in the region. 5th generation hosts (c5, m5, r5) that are larger than *.large typically have relatively low capacity but high usage, hence it's likely that all of your SPOT instances were unviable and caused GameLift queue to place into ON_DEMAND instead.
Here are some graphs illustrating the viability in the last 30 days in IAD for Linux:
As you can see, *.large or 4th gen instance types are typically much more stable in viability. So, I'd recommend you to use c5.large or c4.xlarge to replace one of your x5.xlarge instance types.
You can find out about why your cheapest SPOT fleet wasn't placed by going to CloudWatch and search for "FirstChoiceNotViable" for your queue. https://docs.aws.amazon.com/gamelift/latest/developerguide/monitoring-cloudwatch.html.
Does using SPOT_CAPACITY _OPTIMIZED launch spot instances into an auto-scaling group in AWS Batch?asked 5 days ago
Batch and Spot InterruptionsAccepted Answerasked 4 years ago
Spot instances for inference and sagemaker?asked 3 months ago
Can queries be assigned directly to the DEFAULT queue?asked 3 years ago
Queue Priority not working as expectedAccepted Answerasked 2 years ago
What happen if no spot instance are available ?asked a month ago
Transcoding Duration via Reserved-Queue takes way longer then with On-Demand Queue (MediaConvert)asked 3 months ago
How to do game session migration to handle spot interruption?Accepted Answerasked 4 years ago
Spot Fleet Instance could not be stoppedasked 23 days ago
Queue priorities with Spot fleetsAccepted Answerasked 8 months ago