AWS announces preview of AWS Interconnect - multicloud
AWS announces AWS Interconnect – multicloud (preview), providing simple, resilient, high-speed private connections to other cloud service providers. AWS Interconnect - multicloud is easy to configure and provides high-speed, resilient connectivity with dedicated bandwidth, enabling customers to interconnect AWS networking services such as AWS Transit Gateway, AWS Cloud WAN, and Amazon VPC to other cloud service providers with ease.
Handling EC2 Capacity Constraints with Automated Instance Type Flexibility
Automated solution for EC2 workloads on periodic start/stop schedules that encounter InsufficientInstanceCapacity errors. Intelligently switches to alternative instance types and reverts on stop.
Handling EC2 Capacity Constraints with Automated Instance Type Flexibility
When running EC2 workloads, you might encounter InsufficientInstanceCapacity errors that prevent your instances from starting. This is particularly challenging for workloads that start and stop on a regular basis, such as development environments, batch processing jobs, or scheduled compute tasks.
While On-Demand Capacity Reservations (ODCRs) are ideal for guaranteeing capacity, they require advance planning and commitment. Here's an automated solution that handles capacity constraints in real-time by intelligently switching to alternative instance types.
The Strategy
This solution automatically responds to InsufficientInstanceCapacity by:
- Detecting StartInstances failures via CloudWatch Events monitoring CloudTrail
- Attempting individual starts for each failed instance
- Finding compatible alternatives using EC2's instance requirements API
- Modifying instance types to available alternatives sorted by price
- Reverting on stop to restore original instance types
Architecture Components
- CloudWatch Events Rules: Monitor CloudTrail and EC2 state changes
- Lambda Functions: Handle recovery and revert logic
- DynamoDB Table: Deduplication to prevent duplicate processing
- SSM Parameter Store: Dynamic configuration per instance or global defaults
- IAM Roles: Scoped permissions requiring
Flexible=truetag
How It Works
Start Workflow
- CloudWatch Events Rule monitors CloudTrail for
StartInstancesAPI calls that fail withServer.InsufficientInstanceCapacity - Lambda function is triggered with the failed instance IDs
- For each instance tagged with
Flexible=true:- Attempts to start with current instance type
- If that fails:
- queries compatible instance types based on configurable criteria
- queries prices for eligible instances
- Modifies to the cheapest compatible alternative
- Tags the instance with
OriginalTypefor later restoration - Retries the start operation
Stop Workflow
- CloudWatch Events Rule monitors EC2 instance state changes to
stopped - Lambda function checks for instances with
OriginalTypetag - Waits for instance to fully stop
- Reverts to original instance type
- Removes the
OriginalTypetag
Configuration Flexibility
The solution supports three levels of configuration:
- Instance-specific: Tag instances with
FlexibleConfigurationArnpointing to a custom SSM parameter - Global default: Use
/flexible-instance-starter/defaultSSM parameter - Fallback: Embedded configuration in Lambda function
Key Configuration Parameters
Memory and Storage Buffers
{
"memoryBufferPercentage": 5,
"localStorageBufferPercentage": 5
}
Allows selecting instances with slightly less memory or local storage (e.g., 5% buffer means an 8GB instance can match a 7.6GB target).
CPU and Memory Multipliers
{
"maxCpuMultiplier": 4,
"maxMemoryMultiplier": 2
}
Controls how much larger alternative instances can be (e.g., 4x CPU means a 4 vCPU instance can scale up to 16 vCPU).
CPU Manufacturers
{
"cpuManufacturers": ["intel", "amazon-web-services"]
}
Restricts alternatives to specific CPU vendors for compatibility requirements.
Instance Type Exclusions
{
"excludedInstanceTypes": ["p*.*", "g*.*", "inf*.*", "trn*.*", "f*.*"]
}
Excludes GPU, inference, and specialized instance families using wildcard patterns.
Bare Metal Control
{
"bareMetal": "included"
}
Options: included, required, or excluded for bare metal instances.
Implementation
Prerequisites
- AWS CDK CLI installed
- Python 3.9 or later
- AWS credentials configured
- CloudTrail enabled in your region
Deployment Steps
- Clone and setup
git clone https://github.com/aws-samples/sample-flexible-instance-starter
cd flexible-instance-starter
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
- Deploy the stack
cdk deploy
- Tag your instances
aws ec2 create-tags \
--resources i-1234567890abcdef0 \
--tags Key=Flexible,Value=true
- Optional: Create custom configuration
aws ssm put-parameter \
--name /flexible-instance-starter/my-workload \
--type String \
--value '{
"memoryBufferPercentage": 10,
"maxCpuMultiplier": 2,
"cpuManufacturers": ["intel"],
"excludedInstanceTypes": ["t*.*"]
}'
# Tag instance to use custom config
aws ec2 create-tags \
--resources i-1234567890abcdef0 \
--tags Key=FlexibleConfigurationArn,Value=arn:aws:ssm:us-east-1:123456789012:parameter/flexible-instance-starter/my-workload
Key Benefits
- Automatic recovery - No manual intervention required when capacity issues occur
- Cost optimization - Alternatives are sorted by on-demand price, selecting the cheapest option first
- Workload-specific configuration - Different flexibility rules per instance or workload type
- Transparent operation - Original instance types are automatically restored on stop
- Audit trail - All actions logged to CloudWatch Logs with detailed reasoning
Important Considerations
Compatibility
- Only processes instances tagged with
Flexible=true - Excludes GPU and specialized instance types by default
- Respects architecture (x86_64 vs ARM64) and generation constraints
- Attempts to maintain burstable performance characteristics when applicable
Limitations
- Does not guarantee capacity availability for alternatives
- Not suitable for workloads requiring specific hardware features
Costs
- Lambda execution costs (typically minimal)
- DynamoDB on-demand pricing for deduplication table
- Potential increased EC2 costs if larger instance types are used
- No additional cost for SSM Parameter Store (standard tier)
Security
- IAM policies are scoped to instances with
Flexible=truetag - Separate permissions for tag creation/deletion (only
OriginalType) - CloudWatch Logs retention for audit trail
- No cross-account or cross-region operations
Monitoring
Monitor the solution through CloudWatch Logs:
# View recovery attempts
aws logs tail /aws/lambda/InstanceRecoveryHandler --follow
# View revert operations
aws logs tail /aws/lambda/InstanceStopHandler --follow
# Check for instances with modified types
aws ec2 describe-instances \
--filters "Name=tag-key,Values=OriginalType" \
--query 'Reservations[].Instances[].[InstanceId,InstanceType,Tags[?Key==`OriginalType`].Value|[0]]' \
--output table
Advanced Use Cases
Per-Workload Configuration
Create different flexibility profiles for different workload types:
# Strict configuration for production databases
aws ssm put-parameter \
--name /flexible-instance-starter/production-db \
--type String \
--value '{
"memoryBufferPercentage": 0,
"maxCpuMultiplier": 1,
"cpuManufacturers": ["intel"]
}'
# Flexible configuration for batch processing
aws ssm put-parameter \
--name /flexible-instance-starter/batch-workers \
--type String \
--value '{
"memoryBufferPercentage": 20,
"maxCpuMultiplier": 8,
"cpuManufacturers": ["intel", "amd", "amazon-web-services"]
}'
Cleanup
Remove all resources:
cdk destroy
Conclusion
This automated solution brings flexibility to your EC2 workloads by intelligently adapting to capacity constraints in real-time. By automatically selecting compatible alternative instance types, it positions you to better leverage EC2's diverse instance portfolio and maximize workload availability. The solution seamlessly handles transient capacity challenges while maintaining cost efficiency through price-based selection and automatic restoration of original instance types.
The solution is particularly useful for:
- Development and test environments where capacity reservations aren't cost-effective
- Workloads that can tolerate instance type variations
This approach complements traditional capacity planning strategies, providing an additional layer of resilience that helps ensure your workloads remain operational even during periods of capacity constraint.
- Topics
- Compute
- Language
- English
Such a good read !
