EKS Node Not Ready

0

I have EKS cluster with 4 t.3large which has approx ~50 pods(small size). Often whenever I tried to update the application version from x to y and then few nodes goes into not ready state. Then I have to clean few resources and reboot the worker node and then situation back to normal. Any suggestions ?

Logs from kube-proxy

I0927 16:12:05.785853 1 proxier.go:790] "SyncProxyRules complete" elapsed="104.231873ms" I0927 16:18:27.078985 1 trace.go:205] Trace[1094698301]: "iptables ChainExists" (27-Sep-2022 16:16:36.489) (total time: 66869ms): Trace[1094698301]: [1m6.869976178s] [1m6.869976178s] END I0927 16:18:27.087821 1 trace.go:205] Trace[1957650533]: "iptables ChainExists" (27-Sep-2022 16:16:36.466) (total time: 67555ms): Trace[1957650533]: [1m7.555663612s] [1m7.555663612s] END I0927 16:18:27.124923 1 trace.go:205] Trace[460012371]: "DeltaFIFO Pop Process" ID:monitoring/prometheus-prometheus-node-exporter-gslfb,Depth:36,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:26.836) (total time: 186ms): Trace[460012371]: [186.190275ms] [186.190275ms] END W0927 16:18:27.248231 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding W0927 16:18:27.272469 1 reflector.go:442] k8s.io/client-go/informers/factory.go:134: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding I0927 16:18:31.339045 1 trace.go:205] Trace[1140734081]: "DeltaFIFO Pop Process" ID:cuberun/cuberun-cuberun,Depth:42,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:30.696) (total time: 116ms): Trace[1140734081]: [116.029921ms] [116.029921ms] END I0927 16:18:32.403993 1 trace.go:205] Trace[903972463]: "DeltaFIFO Pop Process" ID:cuberundemo/cuberun-cuberundemo,Depth:41,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:31.657) (total time: 196ms): Trace[903972463]: [196.24798ms] [196.24798ms] END I0927 16:18:33.233172 1 trace.go:205] Trace[1265312678]: "DeltaFIFO Pop Process" ID:argocd/argocd-metrics,Depth:40,Reason:slow event handlers blocking the queue (27-Sep-2022 16:18:32.738) (total time: 359ms): Trace[1265312678]: [359.090093ms] [359.090093ms] END I0927 16:18:33.261077 1 proxier.go:823] "Syncing iptables rules" I0927 16:18:35.474678 1 proxier.go:790] "SyncProxyRules complete" elapsed="2.867637015s" I0927 16:18:35.587939 1 proxier.go:823] "Syncing iptables rules" I0927 16:18:37.014157 1 proxier.go:790] "SyncProxyRules complete" elapsed="1.45321438s" I0927 16:19:08.904513 1 trace.go:205] Trace[1753182031]: "iptables ChainExists" (27-Sep-2022 16:19:06.254) (total time: 2266ms): Trace[1753182031]: [2.266311394s] [2.266311394s] END I0927 16:19:08.904456 1 trace.go:205] Trace[228375231]: "iptables ChainExists" (27-Sep-2022 16:19:06.299) (total time: 2255ms): Trace[228375231]: [2.255433291s] [2.255433291s] END I0927 16:19:40.540864 1 trace.go:205] Trace[2069259157]: "iptables ChainExists" (27-Sep-2022 16:19:36.494) (total time: 3430ms): Trace[2069259157]: [3.430008597s] [3.430008597s] END I0927 16:19:40.540873 1 trace.go:205] Trace[757252858]: "iptables ChainExists" (27-Sep-2022 16:19:36.304) (total time: 3619ms): Trace[757252858]: [3.61980147s] [3.61980147s] END I0927 16:20:09.976580 1 trace.go:205] Trace[2070318544]: "iptables ChainExists" (27-Sep-2022 16:20:06.285) (total time: 3182ms): Trace[2070318544]: [3.182449365s] [3.182449365s] END I0927 16:20:09.976592 1 trace.go:205] Trace[852062251]: "iptables ChainExists" (27-Sep-2022 16:20:06.313) (total time: 3154ms): Trace[852062251]: [3.154369999s] [3.154369999s] END

已提問 2 年前檢視次數 244 次
1 個回答
0

t3.large has only 2 cpu, sounds like you are running over 10 pods on 2 cpu instances. this might cause the instance CPU to max out and cause slowdowns and problems. Check that your instance CPU and memory usage isnt to high and if it is use larger or more instances

AWS
dov
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南