All Content tagged with Reliability

The reliability pillar focuses on workloads performing their intended functions and how to recover quickly from failure to meet demands. Key topics include distributed system design, recovery planning, and adapting to changing requirements.

Content language: English

Select tags to filter
Sort by most recent
26 results
Recently we went through one of the worst incidents i have been a part of. Much of our infrastructure is supported by Kafka for the various event messages that the different applications create, among...
1
answers
0
votes
80
views
asked 25 days ago
Hello AWS Support Team, I am currently using an Amazon RDS PostgreSQL instance, and I need assistance in setting up a physical replica outside of AWS. The goal of this external replica is to have a lo...
1
answers
0
votes
256
views
asked 6 months ago
Hello, I have 10 EC2 instances, each mounted to the same EFS file system. For the "Client connections" metric, I expect a constant value of 10. However, when I view this metric with the "Sum" statist...
2
answers
0
votes
135
views
asked 7 months ago
We've been using OpenSearch (Elasticsearch 7.10) for almost three years now. Every time you issue a software update like Elasticsearch_7_10_R20230928 or Elasticsearch_7_10_R20240502, we're left confu...
1
answers
0
votes
320
views
asked a year ago
I use Textract to read tables that have been filled in with handwriting. In general it works great, but there is a recurring issue of Textract not recognizing '1' or interpreting it as a column separa...
1
answers
0
votes
299
views
asked a year ago
A company is migrating its legacy on-premises applications to the cloud. The applications are monolithic and tightly coupled, making it challenging to scale and manage them efficiently. As a cloud arc...
4
answers
0
votes
692
views
profile picture
asked a year ago
Our Windows server instance i-0bb861ebbbcf0585a restarted suddenly (against all expectations of AWS services and failure control). Lost of ongoing unsaved work. Checking system later for logs later fo...
2
answers
0
votes
499
views
asked a year ago
Hello, I am facing an intermittent issue with a domain managed in AWS Route 53, where I occasionally receive the error: Error: getaddrinfo ENOTFOUND [my domain name]. This error seems to occur sporad...
2
answers
0
votes
2.4K
views
asked a year ago
Hello AWS Community, I am experiencing an issue with an AWS node that is part of a Kubernetes cluster deployed using Kops. **Cluster Configuration:** Deployment: Kubernetes cluster via Kops. Node ...
2
answers
0
votes
709
views
asked a year ago
Hi Team, I got do failover of one EKS production cluster from one AZ to another AZ. Could you please guide me instructions / or some execution methods (any referrence of script /doc or any form)... ...
0
answers
0
votes
597
views
asked a year ago
Hello AWS Community, I'm experiencing an issue where the number of AWS Lambda function invocations is not aligning with the number of messages sent to my SQS queue. The invocation count is unexpected...
3
answers
0
votes
707
views
asked a year ago
I am assuming that when creating a step function with an Activity (Task and external Worker), the tasks get polled in order, though there seems to be no official documentation on this (https://docs.aw...
1
answers
0
votes
806
views
AWS
asked 2 years ago