- 최신
- 최다 투표
- 가장 많은 댓글
One critical code component that many people overlook is retries in exception handling. There is a legacy approach that either something will succeed or it will fail; and that if it fails, it will continue to fail due to some system being hard-down, ie. the database. There are tons of reasons for transient errors, such as a DB lock, or a time-out due to resources that are in the process of auto-scaling.
It is critical to assume a non-zero error rate for legacy as well as modern, complex systems.
When transitioning from on-premises to the cloud, the underlying infrastructure gets abstracted and therefore even more complex. This complexity provides tremendous value including vastly more scalability and resiliency but the trade-offs include even more likelihood of non-zero error rates. Having simple yet thorough exception handling as well as observability is complex but essential.
관련 콘텐츠
- AWS 공식업데이트됨 2년 전
- AWS 공식업데이트됨 4달 전
Hi, you have to add tangible details to your question: metrics, error logs, etc. if you want to obtain meaningful support from re:Post community. "Very unstable" can mean millions of things: detailing in more details what is exactly failing will definitely help. Thanks