EKS Node Not Ready

0

I have EKS cluster with 4 t.3large which has approx ~50 pods(small size). Often whenever I tried to update the application version from x to y and then few nodes goes into not ready state. Then I have to clean few resources and reboot the worker node and then situation back to normal. Any suggestions ?

Logs from kube-proxy

I0927 16:12:05.785853 1 proxier.go:790] "SyncProxyRules complete" elapsed="104.231873ms" I0927 16:18:27.078985 1 trace.go:205] Trace[1094698301]: "iptables ChainExists" (27-Sep-2022 16:16:36.489) (total time: 66869ms): Trace[1094698301]: [1m6.869976178s] [1m6.869976178s] END I0927 16:18:27.087821 1 trace.go:205] Trace[1957650533]: "iptables ChainExists" (27-Sep-2022 16:16:36.466) (total time: 67555ms): Trace[1957650533]: [1m7.555663612s] [1m7.555663612s] END I0927 16:18:27.124923 1 trace.go:205] Trace[460012371]: "DeltaFIFO Pop Process" ID:monitoring/prometheus-prometheus-node-exporter-gslfb,Depth:36,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:26.836) (total time: 186ms): Trace[460012371]: [186.190275ms] [186.190275ms] END W0927 16:18:27.248231 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding W0927 16:18:27.272469 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding I0927 16:18:31.339045 1 trace.go:205] Trace[1140734081]: "DeltaFIFO Pop Process" ID:cuberun/cuberun-cuberun,Depth:42,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:30.696) (total time: 116ms): Trace[1140734081]: [116.029921ms] [116.029921ms] END I0927 16:18:32.403993 1 trace.go:205] Trace[903972463]: "DeltaFIFO Pop Process" ID:cuberundemo/cuberun-cuberundemo,Depth:41,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:31.657) (total time: 196ms): Trace[903972463]: [196.24798ms] [196.24798ms] END I0927 16:18:33.233172 1 trace.go:205] Trace[1265312678]: "DeltaFIFO Pop Process" ID:argocd/argocd-metrics,Depth:40,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:32.738) (total time: 359ms): Trace[1265312678]: [359.090093ms] [359.090093ms] END I0927 16:18:33.261077 1 proxier.go:823] "Syncing iptables rules" I0927 16:18:35.474678 1 proxier.go:790] "SyncProxyRules complete" elapsed="2.867637015s" I0927 16:18:35.587939 1 proxier.go:823] "Syncing iptables rules" I0927 16:18:37.014157 1 proxier.go:790] "SyncProxyRules complete" elapsed="1.45321438s" I0927 16:19:08.904513 1 trace.go:205] Trace[1753182031]: "iptables ChainExists" (27-Sep-2022 16:19:06.254) (total time: 2266ms): Trace[1753182031]: [2.266311394s] [2.266311394s] END I0927 16:19:08.904456 1 trace.go:205] Trace[228375231]: "iptables ChainExists" (27-Sep-2022 16:19:06.299) (total time: 2255ms): Trace[228375231]: [2.255433291s] [2.255433291s] END I0927 16:19:40.540864 1 trace.go:205] Trace[2069259157]: "iptables ChainExists" (27-Sep-2022 16:19:36.494) (total time: 3430ms): Trace[2069259157]: [3.430008597s] [3.430008597s] END I0927 16:19:40.540873 1 trace.go:205] Trace[757252858]: "iptables ChainExists" (27-Sep-2022 16:19:36.304) (total time: 3619ms): Trace[757252858]: [3.61980147s] [3.61980147s] END I0927 16:20:09.976580 1 trace.go:205] Trace[2070318544]: "iptables ChainExists" (27-Sep-2022 16:20:06.285) (total time: 3182ms): Trace[2070318544]: [3.182449365s] [3.182449365s] END I0927 16:20:09.976592 1 trace.go:205] Trace[852062251]: "iptables ChainExists" (27-Sep-2022 16:20:06.313) (total time: 3154ms): Trace[852062251]: [3.154369999s] [3.154369999s] END

gefragt vor einem Jahr218 Aufrufe
1 Antwort
0

t3.large has only 2 cpu, sounds like you are running over 10 pods on 2 cpu instances. this might cause the instance CPU to max out and cause slowdowns and problems. Check that your instance CPU and memory usage isnt to high and if it is use larger or more instances

AWS
dov
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen