Timeouts on reverse proxy after enabling DNS hostnames

0

We are running an nginx reverse proxy on an EC2 instance in a public subnet with a public IP address and an multiple external DNS records like api.example.com (our app) and elastic.example.com (Kibana) pointing at it. Nginx is configured to proxy_pass requests with the appropriate server_name to various private IP addresses on private subnets. This all works fine.

Yesterday, we turned on the “Enable DNS hostnames” setting on the VPC containing all of our EC2 instances, and additionally created a private Route 53 hosted zone and added a record so that an Elasticsearch cluster could be accessed internally under the name “elastic.example.internal”, so we don’t need to maintain the IP addresses of the Elasticsearch hosts on various instances that use this service. This internal access also worked fine. After around 24-48 hours however, the requests to api.example.com and elastic.example.com started failing sometimes, but not all the time, with 503 gateway timeout errors. The experience was just of extremely slow browsing with frequent timeouts. The nginx access.log showed it was returning these 503 errors, and the error.log showed:

2024/02/08 10:43:04 [error] 417483#417483: *120 upstream timed out (110: Connection timed out) while connecting to upstream, client: 176.132.61.88, server: elastic.example.com, request: "GET /70088/bundles/kbn-ui-shared-deps-src/kbn-ui-shared-deps-src.css HTTP/1.1", upstream: "http://10.0.1.137:5601/70088/bundles/kbn-ui-shared-deps-src/kbn-ui-shared-deps-src.css", host: "elastic.example.com", referrer: "https://elastic.example.com/login?next=%2F"

We tried flushing systemd-resolvd, restarting nginx and restarting kibana, which didn’t help. Disabling “Enable DNS hostnames” soon resolved the issue. Why was this issue occurring when nginx was configured to pass requests to these hosts by their IP addresses, not hostnames? Was the internal hosted zone somehow conflicting with the Amazon-provided private DNS hostnames? Is it because the EC2 instances already existed when the setting was enabled? How can we enable DNS hostnames without causing timeouts?

3 Risposte
0
Risposta accettata

This problem was eventually traced to net.netfilter.nf_conntrack_max being set too low on the reverse proxy. The problem was identified by inspecting dmesg logs, which showed netfilter dropping packets. Running sudo sysctl -w net.netfilter.nf_conntrack_max=131072 resolved the issue.

strophy
con risposta 3 mesi fa
0

What private zone did you create? Was it an amazonaws phz?

profile picture
ESPERTO
con risposta 3 mesi fa
0

The private hosted zone was named example.internal where example is our company name. It contained the automatically generated SOA and NS records, as well as two A records with multivalue answer routing policy created by us, named elastic.example.internal.

strophy
con risposta 3 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande