AWS Route 53 goes down at same time some days intermittently for a few minutes only


We have implemented an Elastic Load Balancer for our EC2 instances, and AWS Route 53 is functioning correctly, except for brief periods of downtime occurring around the same time on some (random) days - usually during Monday-Friday. Despite adjusting the Time to Live (TTL) settings on our domain provider from 60 minutes to 1 minute, the issue persists. This phenomenon affects the US-East-1 region and is not correlated with any reported AWS outages.

During these short disruptions, which transpire between 10:00 AM and 2:00 PM Pacific Time, our client users encounter intermittent 404 errors. However, normal functionality is restored within 5-10 minutes. Outside of these periods, the system operates without any issues. A thorough examination of the Elastic Load Balancer settings has not revealed any anomalies.

Our infrastructure consists of a Django application running on an Ubuntu OS within the EC2 instances, accompanied by a RabbitMQ cluster comprising three nodes. Although these downtimes coincide with peak usage hours, there is no evidence to suggest that network or traffic congestion is the root cause of the problem.

1 Answer

A 404 error is an indication that the client has reached the application and it has said "the thing you asked for isn't here" - it's a "Not Found" error.

What this means is that your application is running and it is responding but what the client is asking for can't be delivered - this could be because the client has asked for the wrong thing; or the server component has some fault where it isn't returning or finding the correct "thing" to respond with.

You don't say whether the load balancer is doing anything else - if you're using ALB and (maybe) have WAF enabled then I would check to see that the WAF rules aren't going to deny traffic at particular times; or have dependencies that will cause them to fail. However, in the simplest case (where the load balancer is just being used to distribute traffic) then that won't be the case.

I'd also recommend enabling access logs to try and debug the issue.

After all this, if you definitely think it is a load balancer fault then please create a support case - the support team can look at the dates and times and see if there was any issue with the load balancer itself.

profile pictureAWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions