- Le plus récent
- Le plus de votes
- La plupart des commentaires
One critical code component that many people overlook is retries in exception handling. There is a legacy approach that either something will succeed or it will fail; and that if it fails, it will continue to fail due to some system being hard-down, ie. the database. There are tons of reasons for transient errors, such as a DB lock, or a time-out due to resources that are in the process of auto-scaling.
It is critical to assume a non-zero error rate for legacy as well as modern, complex systems.
When transitioning from on-premises to the cloud, the underlying infrastructure gets abstracted and therefore even more complex. This complexity provides tremendous value including vastly more scalability and resiliency but the trade-offs include even more likelihood of non-zero error rates. Having simple yet thorough exception handling as well as observability is complex but essential.
Contenus pertinents
- demandé il y a un an
- Réponse acceptéedemandé il y a un an
- demandé il y a 2 mois
- AWS OFFICIELA mis à jour il y a 3 ans
Hi, you have to add tangible details to your question: metrics, error logs, etc. if you want to obtain meaningful support from re:Post community. "Very unstable" can mean millions of things: detailing in more details what is exactly failing will definitely help. Thanks