EKS Node Not Ready

0

I have EKS cluster with 4 t.3large which has approx ~50 pods(small size). Often whenever I tried to update the application version from x to y and then few nodes goes into not ready state. Then I have to clean few resources and reboot the worker node and then situation back to normal. Any suggestions ?

Logs from kube-proxy

I0927 16:12:05.785853 1 proxier.go:790] "SyncProxyRules complete" elapsed="104.231873ms" I0927 16:18:27.078985 1 trace.go:205] Trace[1094698301]: "iptables ChainExists" (27-Sep-2022 16:16:36.489) (total time: 66869ms): Trace[1094698301]: [1m6.869976178s] [1m6.869976178s] END I0927 16:18:27.087821 1 trace.go:205] Trace[1957650533]: "iptables ChainExists" (27-Sep-2022 16:16:36.466) (total time: 67555ms): Trace[1957650533]: [1m7.555663612s] [1m7.555663612s] END I0927 16:18:27.124923 1 trace.go:205] Trace[460012371]: "DeltaFIFO Pop Process" ID:monitoring/prometheus-prometheus-node-exporter-gslfb,Depth:36,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:26.836) (total time: 186ms): Trace[460012371]: [186.190275ms] [186.190275ms] END W0927 16:18:27.248231 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding W0927 16:18:27.272469 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding I0927 16:18:31.339045 1 trace.go:205] Trace[1140734081]: "DeltaFIFO Pop Process" ID:cuberun/cuberun-cuberun,Depth:42,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:30.696) (total time: 116ms): Trace[1140734081]: [116.029921ms] [116.029921ms] END I0927 16:18:32.403993 1 trace.go:205] Trace[903972463]: "DeltaFIFO Pop Process" ID:cuberundemo/cuberun-cuberundemo,Depth:41,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:31.657) (total time: 196ms): Trace[903972463]: [196.24798ms] [196.24798ms] END I0927 16:18:33.233172 1 trace.go:205] Trace[1265312678]: "DeltaFIFO Pop Process" ID:argocd/argocd-metrics,Depth:40,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:32.738) (total time: 359ms): Trace[1265312678]: [359.090093ms] [359.090093ms] END I0927 16:18:33.261077 1 proxier.go:823] "Syncing iptables rules" I0927 16:18:35.474678 1 proxier.go:790] "SyncProxyRules complete" elapsed="2.867637015s" I0927 16:18:35.587939 1 proxier.go:823] "Syncing iptables rules" I0927 16:18:37.014157 1 proxier.go:790] "SyncProxyRules complete" elapsed="1.45321438s" I0927 16:19:08.904513 1 trace.go:205] Trace[1753182031]: "iptables ChainExists" (27-Sep-2022 16:19:06.254) (total time: 2266ms): Trace[1753182031]: [2.266311394s] [2.266311394s] END I0927 16:19:08.904456 1 trace.go:205] Trace[228375231]: "iptables ChainExists" (27-Sep-2022 16:19:06.299) (total time: 2255ms): Trace[228375231]: [2.255433291s] [2.255433291s] END I0927 16:19:40.540864 1 trace.go:205] Trace[2069259157]: "iptables ChainExists" (27-Sep-2022 16:19:36.494) (total time: 3430ms): Trace[2069259157]: [3.430008597s] [3.430008597s] END I0927 16:19:40.540873 1 trace.go:205] Trace[757252858]: "iptables ChainExists" (27-Sep-2022 16:19:36.304) (total time: 3619ms): Trace[757252858]: [3.61980147s] [3.61980147s] END I0927 16:20:09.976580 1 trace.go:205] Trace[2070318544]: "iptables ChainExists" (27-Sep-2022 16:20:06.285) (total time: 3182ms): Trace[2070318544]: [3.182449365s] [3.182449365s] END I0927 16:20:09.976592 1 trace.go:205] Trace[852062251]: "iptables ChainExists" (27-Sep-2022 16:20:06.313) (total time: 3154ms): Trace[852062251]: [3.154369999s] [3.154369999s] END

feita há um ano213 visualizações
1 Resposta
0

t3.large has only 2 cpu, sounds like you are running over 10 pods on 2 cpu instances. this might cause the instance CPU to max out and cause slowdowns and problems. Check that your instance CPU and memory usage isnt to high and if it is use larger or more instances

AWS
dov
respondido há um ano

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas