- Newest
- Most votes
- Most comments
You mentioned that all the requests that led to a 502 response failed to resolve the DNS name to an IP address. Were you using the default Route 53 Resolver in your VPC? It's available in the instance-local IP address 169.254.169.253 as well as the .2 IP of the VPC's primary CIDR. Both are completely internal to your VPC and do not use the elastic IP in any way.
If you're using an external DNS service and the errors can be tracked down to DNS resolution failing, the restriction is likely with the external DNS service, which might be throttling DNS queries by source IP. Switching to the Route 53 Resolver local to your VPC would likely fix the issue, because it will cache results within the constraints imposed by the TTL (time to live) value set by the source DNS service.
Route 53 Resolver has its own query limit of 1,024 queries per second per instance (https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/DNSLimitations.html#limits-api-entities-resolver), but it can be increased if necessary by creating separate Route 53 Resolver inbound endpoints in your VPC and sending the DNS queries to them. Each ENI of such an endpoint can handle up to 10,000 queries per second.
If the limit of 1,024 queries per second is a problem, another approach would be to install a local caching-only BIND instance on your server and using it as the DNS resolver. It would cache the responses from the source (received via Route 53 Resolver), reducing the frequency at which queries are sent to Route 53 Resolver.
Based on the information provided, it appears that you're experiencing an issue related to IP-based throttling or blocking, rather than a default AWS throttling mechanism for Elastic IPs or DNS queries. AWS does not typically impose default throttling on specific Elastic IPs or DNS queries per IP address.
The behavior you're describing suggests that the problem is likely occurring at the client's end (CloudFront or their origin server) rather than on your AWS EC2 instances or Elastic IPs. Here are some possible explanations and steps to investigate:
-
IP Reputation: The Elastic IP you're using might have been flagged or blacklisted due to suspicious activity. This could happen if the IP was previously used by another customer who engaged in malicious behavior. Try requesting a new Elastic IP from AWS and see if the problem persists.
-
Rate Limiting: Although the client claims they haven't enabled WAF or ACL, there might be some form of rate limiting or DDoS protection in place at their CloudFront or origin server level. This could be triggered by the volume of requests coming from your single IP address.
-
DNS Resolution Issues: The fact that host to IP resolution is failing for requests from the problematic IP suggests there might be an issue with how DNS queries are being handled for that specific IP. This could be due to DNS-based rate limiting or blocking at the client's DNS provider.
-
Network Path: There could be an issue in the network path between your Elastic IP and the client's infrastructure. Try running traceroute or mtr to see if there are any consistent points of failure along the route.
-
CloudFront Configuration: Even if the client hasn't explicitly enabled WAF or ACL, there might be other CloudFront settings affecting how it handles requests from your IP. Ask the client to check their CloudFront distribution settings, particularly any geographic restrictions or custom headers requirements.
To troubleshoot further:
-
Monitor your EC2 instance's network performance metrics to ensure you're not hitting any AWS-imposed limits.
-
Use tools like curl or wget from your EC2 instance to test the connection to the client's CloudFront distribution, capturing detailed headers and timing information.
-
Implement logging on your Nginx reverse proxy to capture more details about the failed requests, including any error messages received from the upstream server.
-
Consider implementing a multi-IP solution, such as using multiple EC2 instances behind a Network Load Balancer, to distribute requests across different source IPs.
-
Work closely with your client to have them investigate their CloudFront and origin server logs for any patterns or error messages related to requests from your IP address.
If the issue persists after these investigations, it may be worth engaging AWS support to help identify any potential issues at the network or infrastructure level.
Sources
Throttle requests to your HTTP APIs for better throughput in API Gateway - Amazon API Gateway
Throttle requests to your REST APIs for better throughput in API Gateway - Amazon API Gateway
REL05-BP02 Throttle requests - AWS Well-Architected Framework (2023-04-10)
REL05-BP02 Throttle requests - AWS Well-Architected Framework
We did asked these information from the client but they confirmed that there is no restrictions from their end.
Relevant content
- asked 9 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
Hi Leo,
Thanks for your prompt response.
We did use the default Route 53 Resolver (.2 IP) which was in the resolve.conf of the Linux EC2 instance by default since last couple of years until now when we started getting this issue 2 weeks back.
We also changed it to Google (8.8.8.8), Cloudflare (1.1.1.1), Quad9 (9.9.9.9) etc. just to check if that resolves the issue. But unfortunately it did not help. As we attach a new Elastic IP to this existing server, it starts resolving the name to IP and there is no 502 response and also as we assign the existing IP address to another instance, even that instance starts getting 502 errors as I have explained in the point 3 of the question. Hope I am clear.
Regards
Hi @Om. I get what you're saying, but the symptoms don't quite seem to add up. The elastic IP doesn't affect traffic to the R53 Resolver in your VPC. The likeliest explanation is that the DNS queries would be using a different DNS resolver. For example, your /etc/resolv.conf could be pointing to R53 Resolver, but if you had a local BIND name service on your server, it could be reaching out directly to the target name servers, easily explaining why your high-volume requests could get throttled and why the problem is tied to the EIP used to make those high-frequency DNS queries.
If possible, you might want to run tcpdump on the server to confirm which DNS service(s) it's actually using while the failures appear. Find the network interface name (like "en0") with
ip addr
and runtcpdump -nnn -p -i en0 -c 10000 port 53
(replacing "en0" with the right device) to see the first 10k packets on port 53. Are they all going to the VPC's .2 address, or are some perhaps going elsewhere, like to the external public IPs of the target name service?