Timeouts on reverse proxy after enabling DNS hostnames

0

We are running an nginx reverse proxy on an EC2 instance in a public subnet with a public IP address and an multiple external DNS records like api.example.com (our app) and elastic.example.com (Kibana) pointing at it. Nginx is configured to proxy_pass requests with the appropriate server_name to various private IP addresses on private subnets. This all works fine.

Yesterday, we turned on the “Enable DNS hostnames” setting on the VPC containing all of our EC2 instances, and additionally created a private Route 53 hosted zone and added a record so that an Elasticsearch cluster could be accessed internally under the name “elastic.example.internal”, so we don’t need to maintain the IP addresses of the Elasticsearch hosts on various instances that use this service. This internal access also worked fine. After around 24-48 hours however, the requests to api.example.com and elastic.example.com started failing sometimes, but not all the time, with 503 gateway timeout errors. The experience was just of extremely slow browsing with frequent timeouts. The nginx access.log showed it was returning these 503 errors, and the error.log showed:

2024/02/08 10:43:04 [error] 417483#417483: *120 upstream timed out (110: Connection timed out) while connecting to upstream, client: 176.132.61.88, server: elastic.example.com, request: "GET /70088/bundles/kbn-ui-shared-deps-src/kbn-ui-shared-deps-src.css HTTP/1.1", upstream: "http://10.0.1.137:5601/70088/bundles/kbn-ui-shared-deps-src/kbn-ui-shared-deps-src.css", host: "elastic.example.com", referrer: "https://elastic.example.com/login?next=%2F"

We tried flushing systemd-resolvd, restarting nginx and restarting kibana, which didn’t help. Disabling “Enable DNS hostnames” soon resolved the issue. Why was this issue occurring when nginx was configured to pass requests to these hosts by their IP addresses, not hostnames? Was the internal hosted zone somehow conflicting with the Amazon-provided private DNS hostnames? Is it because the EC2 instances already existed when the setting was enabled? How can we enable DNS hostnames without causing timeouts?

strophy
preguntada hace 3 meses192 visualizaciones
3 Respuestas
0
Respuesta aceptada

This problem was eventually traced to net.netfilter.nf_conntrack_max being set too low on the reverse proxy. The problem was identified by inspecting dmesg logs, which showed netfilter dropping packets. Running sudo sysctl -w net.netfilter.nf_conntrack_max=131072 resolved the issue.

strophy
respondido hace 3 meses
0

What private zone did you create? Was it an amazonaws phz?

profile picture
EXPERTO
respondido hace 3 meses
0

The private hosted zone was named example.internal where example is our company name. It contained the automatically generated SOA and NS records, as well as two A records with multivalue answer routing policy created by us, named elastic.example.internal.

strophy
respondido hace 3 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas