Get Hands-on with Amazon EKS - Workshop Event Series
Whether you're taking your first steps with Kubernetes or you're an experienced practitioner looking to sharpen your skills, our Amazon EKS workshop series delivers practical, real-world experience that moves you forward. Learn directly from AWS solutions architects and EKS specialists through hands-on sessions designed to build your confidence with Kubernetes. Register now and start building with Amazon EKS!
Best Practices for AWS Glue Schema Registry in Large-Scale Application Restarts
Production streaming applications across Apache Flink, Spark Streaming, Kafka Streams, Kinesis Client Library and AWS Lambda might experience throttling on AWS Glue Schema Registry's GetSchemaVersion API during simultaneous restarts if they are using AWS Glue Schema Registry for schema management. This throttling causes pipeline failures, data processing delays, and cascading system impacts.
Production streaming applications across Apache Flink, Spark Streaming, Kafka Streams, Kinesis Client Library and AWS Lambda might experience throttling on AWS Glue Schema Registry's GetSchemaVersion API during simultaneous restarts if they are using AWS Glue Schema Registry for schema management. This throttling causes pipeline failures, data processing delays, and cascading system impacts. The issue manifests when 50-100+ applications restart simultaneously during maintenance windows, deployments, disaster recovery scenarios, or auto-scaling events, creating a "thundering herd" that overwhelms default API quotas and triggers crash loops across the streaming infrastructure.
The Simultaneous Restart Issue
When many streaming apps restart at the same time, they all call AWS Glue Schema Registry (GetSchemaVersion) during startup. Because caches are empty, every app makes multiple API calls at once, easily exceeding the per-region/Account rate limits . This causes immediate throttling, failed startups, and retry storms, leading to cascading failures and pipeline delays. The GetSchemaVersion API plays a central role during application startup because streaming applications intentionally resolve schemas dynamically at runtime. During cold starts, all applications begin with empty local caches, so the first few seconds of initialization naturally involve a burst of schema resolution requests. When many applications start simultaneously, these requests become time-aligned within a short window, as each application maintains its own independent cache by design and there is no cross-application state sharing. As a result, large numbers of concurrent executions may resolve schemas at the same time during traffic spikes. This creates a predictable and well-understood startup pattern: an initial surge of schema lookups, followed by rapid stabilization once caches are populated. With appropriate retry strategies and backoff mechanisms, systems quickly converge to steady-state operation, and overall schema resolution remains efficient and reliable across the streaming platform.
Best Practices
Implement Staggered Application Restarts
Deploying applications in batches rather than simultaneously reduces peak API load by 80% and provides time for caches to warm up between batches. Configure deployments with 5 batches of 10 applications each, with 30-60 second delays between batches. For example, with Kubernetes deployments, use rolling update strategies. With Spark jobs on EMR, deploy in sequential batches using shell scripts with sleep delays between batch submissions. For Lambda functions use alias-based gradual deployments, starting with 20% traffic to new versions and gradually increasing to 50% then 100% while monitoring for throttling.
Add Exponential Backoff with Jitter
Implement retry logic with exponential backoff and jitter in all schema lookup code to handle transient throttling gracefully. The pattern should use a base delay of 1 second, double the wait time after each retry (1s, 2s, 4s, 8s, 16s), cap maximum delay at 32 seconds, and add random jitter of ±25% to prevent synchronized retries across applications. Retry only on ThrottlingException errors, not other error types, and fail fast after 5 retry attempts to avoid indefinite blocking. The jitter component is critical because without it, all throttled applications retry at exactly the same intervals, recreating the thundering herd problem.
Python example:
`import random import time from botocore.exceptions import ClientError
def get_schema_with_backoff(glue_client, schema_id, version_number, max_retries=5): """ Retrieve schema version with exponential backoff and jitter.
Args:
glue_client: Boto3 Glue client
schema_id: Schema ARN or name
version_number: Schema version number
max_retries: Maximum number of retry attempts
Returns:
Schema version response
"""
base_delay = 1 # seconds
max_delay = 32 # seconds
for attempt in range(max_retries):
try:
return glue_client.get_schema_version(
SchemaId={'SchemaArn': schema_id},
SchemaVersionNumber={'VersionNumber': version_number}
)
except ClientError as e:
if e.response['Error']['Code'] == 'ThrottlingException':
if attempt == max_retries - 1:
raise
# Exponential backoff: 1s, 2s, 4s, 8s, 16s
delay = min(base_delay * (2 ** attempt), max_delay)
# Add jitter: randomize ±25% of delay
jitter = delay * 0.25 * (2 * random.random() - 1)
sleep_time = delay + jitter
print(f"Throttled on attempt {attempt + 1}, retrying in {sleep_time:.2f}s...")
time.sleep(sleep_time)
else:
raise
raise RuntimeError(f"Failed to get schema after {max_retries} retries")
`
Enable Multi-Layer Caching
Configure comprehensive caching strategies using AWS Glue Schema Registry's built-in capabilities on application-level and distributed caching. Set in-memory cache with a 24-hour TTL to balance memory usage with cache effectiveness. For environments with many applications, deploy a centralized distributed cache using Amazon MemoryDB that all applications share.
Monitor and Alert on API Throttling
Implement comprehensive monitoring using CloudWatch to track GetSchemaVersion call counts, throttle rates, schema lookup latency, and cache hit/miss ratio. Create CloudWatch alarms that trigger when throttle rate exceeds.
Request Service Quota Increases Proactively
If required, submit quota increase requests through AWS Support before production deployment, document the approved limits in runbooks and disaster recovery procedures, and set up CloudWatch alarms when utilization exceeds 80% of quota to provide early warning of capacity issues.
Conclusion
Simultaneous restarts in large streaming environments naturally create short-lived bursts of schema resolution traffic, which can lead to throttling if not managed proactively. By applying simple architectural best practices—such as staggered deployments, exponential backoff with jitter, multi-layer caching, proactive monitoring, and quota planning—teams can ensure smooth startups, fast cache warm-up, and stable operation at scale.
Relevant content
- asked a year ago
- asked 4 months ago
AWS OFFICIALUpdated a year ago