EKS Node Not Ready

0

I have EKS cluster with 4 t.3large which has approx ~50 pods(small size). Often whenever I tried to update the application version from x to y and then few nodes goes into not ready state. Then I have to clean few resources and reboot the worker node and then situation back to normal. Any suggestions ?

Logs from kube-proxy

I0927 16:12:05.785853 1 proxier.go:790] "SyncProxyRules complete" elapsed="104.231873ms" I0927 16:18:27.078985 1 trace.go:205] Trace[1094698301]: "iptables ChainExists" (27-Sep-2022 16:16:36.489) (total time: 66869ms): Trace[1094698301]: [1m6.869976178s] [1m6.869976178s] END I0927 16:18:27.087821 1 trace.go:205] Trace[1957650533]: "iptables ChainExists" (27-Sep-2022 16:16:36.466) (total time: 67555ms): Trace[1957650533]: [1m7.555663612s] [1m7.555663612s] END I0927 16:18:27.124923 1 trace.go:205] Trace[460012371]: "DeltaFIFO Pop Process" ID:monitoring/prometheus-prometheus-node-exporter-gslfb,Depth:36,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:26.836) (total time: 186ms): Trace[460012371]: [186.190275ms] [186.190275ms] END W0927 16:18:27.248231 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding W0927 16:18:27.272469 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding I0927 16:18:31.339045 1 trace.go:205] Trace[1140734081]: "DeltaFIFO Pop Process" ID:cuberun/cuberun-cuberun,Depth:42,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:30.696) (total time: 116ms): Trace[1140734081]: [116.029921ms] [116.029921ms] END I0927 16:18:32.403993 1 trace.go:205] Trace[903972463]: "DeltaFIFO Pop Process" ID:cuberundemo/cuberun-cuberundemo,Depth:41,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:31.657) (total time: 196ms): Trace[903972463]: [196.24798ms] [196.24798ms] END I0927 16:18:33.233172 1 trace.go:205] Trace[1265312678]: "DeltaFIFO Pop Process" ID:argocd/argocd-metrics,Depth:40,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:32.738) (total time: 359ms): Trace[1265312678]: [359.090093ms] [359.090093ms] END I0927 16:18:33.261077 1 proxier.go:823] "Syncing iptables rules" I0927 16:18:35.474678 1 proxier.go:790] "SyncProxyRules complete" elapsed="2.867637015s" I0927 16:18:35.587939 1 proxier.go:823] "Syncing iptables rules" I0927 16:18:37.014157 1 proxier.go:790] "SyncProxyRules complete" elapsed="1.45321438s" I0927 16:19:08.904513 1 trace.go:205] Trace[1753182031]: "iptables ChainExists" (27-Sep-2022 16:19:06.254) (total time: 2266ms): Trace[1753182031]: [2.266311394s] [2.266311394s] END I0927 16:19:08.904456 1 trace.go:205] Trace[228375231]: "iptables ChainExists" (27-Sep-2022 16:19:06.299) (total time: 2255ms): Trace[228375231]: [2.255433291s] [2.255433291s] END I0927 16:19:40.540864 1 trace.go:205] Trace[2069259157]: "iptables ChainExists" (27-Sep-2022 16:19:36.494) (total time: 3430ms): Trace[2069259157]: [3.430008597s] [3.430008597s] END I0927 16:19:40.540873 1 trace.go:205] Trace[757252858]: "iptables ChainExists" (27-Sep-2022 16:19:36.304) (total time: 3619ms): Trace[757252858]: [3.61980147s] [3.61980147s] END I0927 16:20:09.976580 1 trace.go:205] Trace[2070318544]: "iptables ChainExists" (27-Sep-2022 16:20:06.285) (total time: 3182ms): Trace[2070318544]: [3.182449365s] [3.182449365s] END I0927 16:20:09.976592 1 trace.go:205] Trace[852062251]: "iptables ChainExists" (27-Sep-2022 16:20:06.313) (total time: 3154ms): Trace[852062251]: [3.154369999s] [3.154369999s] END

demandé il y a un an218 vues
1 réponse
0

t3.large has only 2 cpu, sounds like you are running over 10 pods on 2 cpu instances. this might cause the instance CPU to max out and cause slowdowns and problems. Check that your instance CPU and memory usage isnt to high and if it is use larger or more instances

AWS
dov
répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions