Timeouts on reverse proxy after enabling DNS hostnames

0

We are running an nginx reverse proxy on an EC2 instance in a public subnet with a public IP address and an multiple external DNS records like api.example.com (our app) and elastic.example.com (Kibana) pointing at it. Nginx is configured to proxy_pass requests with the appropriate server_name to various private IP addresses on private subnets. This all works fine.

Yesterday, we turned on the “Enable DNS hostnames” setting on the VPC containing all of our EC2 instances, and additionally created a private Route 53 hosted zone and added a record so that an Elasticsearch cluster could be accessed internally under the name “elastic.example.internal”, so we don’t need to maintain the IP addresses of the Elasticsearch hosts on various instances that use this service. This internal access also worked fine. After around 24-48 hours however, the requests to api.example.com and elastic.example.com started failing sometimes, but not all the time, with 503 gateway timeout errors. The experience was just of extremely slow browsing with frequent timeouts. The nginx access.log showed it was returning these 503 errors, and the error.log showed:

2024/02/08 10:43:04 [error] 417483#417483: *120 upstream timed out (110: Connection timed out) while connecting to upstream, client: 176.132.61.88, server: elastic.example.com, request: "GET /70088/bundles/kbn-ui-shared-deps-src/kbn-ui-shared-deps-src.css HTTP/1.1", upstream: "http://10.0.1.137:5601/70088/bundles/kbn-ui-shared-deps-src/kbn-ui-shared-deps-src.css", host: "elastic.example.com", referrer: "https://elastic.example.com/login?next=%2F"

We tried flushing systemd-resolvd, restarting nginx and restarting kibana, which didn’t help. Disabling “Enable DNS hostnames” soon resolved the issue. Why was this issue occurring when nginx was configured to pass requests to these hosts by their IP addresses, not hostnames? Was the internal hosted zone somehow conflicting with the Amazon-provided private DNS hostnames? Is it because the EC2 instances already existed when the setting was enabled? How can we enable DNS hostnames without causing timeouts?

3개 답변
0
수락된 답변

This problem was eventually traced to net.netfilter.nf_conntrack_max being set too low on the reverse proxy. The problem was identified by inspecting dmesg logs, which showed netfilter dropping packets. Running sudo sysctl -w net.netfilter.nf_conntrack_max=131072 resolved the issue.

strophy
답변함 3달 전
0

What private zone did you create? Was it an amazonaws phz?

profile picture
전문가
답변함 3달 전
0

The private hosted zone was named example.internal where example is our company name. It contained the automatically generated SOA and NS records, as well as two A records with multivalue answer routing policy created by us, named elastic.example.internal.

strophy
답변함 3달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠