By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon Elastic Container Service

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Fully private eks cluster

Hi, I have a fully private VPC named HSCN without any internet access containing 2 public and 2 private subnets. This VPC is peered with another VPC let's say internet-vpc. I want to deploy my fully private eks cluster in the private subnet of HSCN-VPC. I have followed the [private cluster requirements](https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html). I am not deploying any pod so I don't need the repository yet. For the 2nd and 3rd requirement, eksctl takes care of it by itself. The problem is when I deploy the cluster my node instances are failing to join. Secondly, my kubectl and eksctl commands time out. Which means I am not able to get cluster info or any node information. Blow is my cluster config ``` apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: test-cluster region: eu-west-2 version: "1.23" privateCluster: enabled: true additionalEndpointServices: - "autoscaling" vpc: id: vpc-id subnets: private: hscn-1-subnet: id: subnet-id hscn-2-subnet: id: subnet-id managedNodeGroups: - name: serv-test-1 instanceType: m5.xlarge desiredCapacity: 1 volumeType: gp2 volumeSize: 50 privateNetworking: true amiFamily: Ubuntu2004 subnets: - hscn-2-subnet ssh: allow: true labels: role: role tags: nodegroup-role: testing ``` It is clear that my nodes and kubectl commands are not able to communicate to kubernetes api server endpoints. Is there even a way to deploy a cluster in the setup like mentioned above? If yes, then please someone guide me how can I deploy fully functional cluster in this setup? Thanks
2
answers
0
votes
106
views
asked a month ago

Intermittent health check timeouts causing ECS to kill tasks

We have an ECS service running our API. Normally this service runs with ~12 tasks. The service is configured with an HTTP health check that returns a 200 if certain conditions are met - usually this returns within ~200ms. We have a scaling policy that starts new tasks based on the average CPU of our tasks. Recently we have seen that ECS is terminating a large chunk of tasks at a time (often ~50% of the tasks) and then our service drops requests as we don't have the capacity to handle the inbound requests. I have noticed, at least on the most recent occurrence, that we had a spike in traffic of about 40% of our current traffic around the time that ECS terminated a bunch of tasks, however, the capacity should be there in our API to handle this without any issue. This issue has happened ~5 times in the past week or so but is very intermittent and doesn't seem to affect the entire service - only certain tasks. I have checked all of our monitoring and logging and I can't see anything as to why the health check would be failing. The application logs for the tasks are completely normal. All I have to go on are the following messages in the ECS event log: ``` service my-service (port 8000) is unhealthy in target-group my-target-group due to (reason Request timed out). ``` Is there any further troubleshooting I can do to understand what is causing this? Also, if the issue is somehow triggered by an increase in load, is there a way we can prevent the ECS service from immediately terminating the tasks (which inevitably compounds the issue).
0
answers
0
votes
24
views
asked a month ago