By using AWS re:Post, you agree to the Terms of Use
/Amazon Elastic Kubernetes Service/

Questions tagged with Amazon Elastic Kubernetes Service

Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

0
answers
0
votes
2
views
asked 10 days ago

Ingress annotations only for a specific path

Hi, I have this ingress configuration: ``` apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: "oidc-ingress" annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: ip alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]' alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}' alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=300 external-dns.alpha.kubernetes.io/hostname: example.com !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! alb.ingress.kubernetes.io/auth-type: oidc alb.ingress.kubernetes.io/auth-on-unauthenticated-request: authenticate alb.ingress.kubernetes.io/auth-idp-oidc: '{"issuer":"https://login.microsoftonline.com/some-id/v2.0","authorizationEndpoint":"https://login.microsoftonline.com/some-id/oauth2/v2.0/authorize","tokenEndpoint":"https://login.microsoftonline.com/some-id/oauth2/v2.0/token","userInfoEndpoint":"https://graph.microsoft.com/oidc/userinfo","secretName":"aws-alb-secret"}' !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! spec: rules: - http: paths: - pathType: Prefix path: / backend: service: name: ssl-redirect port: name: use-annotation - pathType: Prefix path: /jenkins backend: service: name: jenkins port: number: 8080 - pathType: Prefix path: / backend: service: name: apache port: number: 80 ``` If I `kubectl appy` this `Ingress` config it will apply `annotations` to all routing rules, which means: ``` /* /jenkins /jenkins/* ``` I would like to apply `OIDC annotations` only for the `Jenkins rules`, it means: 1. If I open `https://example.com` it will be available to everyone. 2. If I open `https://example.com/jenkins`, it will redirect me to `OIDC auth` page. I can do this manually through `AWS console` when I remove `authenticate rule` from `/*` and leave it for `/jenkins/*` only. However I would like to achieve this through `Ingress annotations` to be able to automate this process. Please how can I do this? Thanks for your help.
2
answers
0
votes
9
views
asked a month ago

Problem adding nodegroup in EKS cluster with GW NAT

Hello I am having difficulties in bringing an EKS cluster back into compliance **Cluster:** I have an eks cluster with : - 6 EKS Plane Control Networks (network 1-6) i. Network 1/2/3 are in a RA routing table with a 0.0.0.0/0 which refers to an Internet Gateway ii. Network 4/5/6 are in an RB routing table with a 0.0.0.0/0 that refers to a NAT Gateway (+ other routes to my company network) - 4 cluster nodegroupe with networks 4/5/6 used for worker nodes - My EKS cluster has a Public and Private API ( => From a node, when I do a DNS resolution I do see a private IP) **Target:** EKS cluster with : - 6 EKS Plane Control Networks (network 1-6) i. Network 1/2/3 in a RA routing table with a 0.0.0.0/0 that refers to an Internet Gateway ii. Network 4/5/6 also in the RA routing table - 4 cluster nodegroupe i. Nodegroupe 1 : Use networks 10 and should be in the RC routing table with 0.0.0.0/0 which refers to a new NAT Gateway (+ other routes to my company network) ii. Nodegroupe 2 : Use networks 11 and should be in the RC routing table with 0.0.0.0/0 which refers to a new NAT Gateway (+ other routes to my company network) iii. Nodegroupe 3 : Use networks 12 and should be in the RC routing table with 0.0.0.0/0 which refers to a new NAT Gateway (+ other routes to my company network) iiii. Nodegroupe 4 : Use networks 13 and should be in the RC routing table with 0.0.0.0/0 which refers to a new NAT Gateway (+ other routes to my company network) **Problem** When creating a new nodegroup to replace an existing one, I indicate network 10/11/12 or 13 The RC routing table is OK with the NAT Gateway Problem: the node can't join the cluster (error message: **Instances failed to join the kubernetes cluster**) I can see the EC2 instance being created in the right network 10/11/12 or 13 I don't understand the problem, why the nodes in this network 10/11/12 or 13 can't join the API cluster through the ENI in network 1-6? When I create a new nodegroup and I indicate a network 1-6 (network on route table RA or RB) it works without problem Sincerely
0
answers
0
votes
1
views
asked a month ago

Can't get EventBridge rule to create a message in SQS

I am trying to setup the [AWS node termination handler](https://github.com/aws/aws-node-termination-handler), and am running into an issue where the EventBridge rule is invoked, but no messages are showing up in the sqs queue. I have tested and the termination handler is able to communicate with the SQS queue. I have also tested spinning instances up and down, and see the rule invocations for the EventBridge rules. However, there are no messages appearing in the queue... NOTE: I tried adding a photo here from cloudwatch showing rule invocations but no messages appearing in the queue, it seems like pictures are not supported here yet... Below are my configs for this: SQS policy: ```hcl resource "aws_sqs_queue_policy" "termination_handler_queue_policy" { queue_url = module.termination_handler_queue.sqs_queue_id policy = jsonencode({ "Version" : "2012-10-17", "Id" : "sqspolicy", "Statement" : [ { "Sid" : "TermEventsToHandlerQueue", "Effect" : "Allow", "Principal" : { "Service" : ["events.amazonaws.com", "sqs.amazonaws.com"] }, "Action" : "sqs:*", "Resource" : "${module.termination_handler_queue.sqs_queue_name}", "Condition" : { "ArnEquals" : { "aws:SourceArn" : ["arn:aws:events:us-east-2:${local.account_id}:rule/node-termination-asg-lifecycle-rule", "arn:aws:events:us-east-2:${local.account_id}:rule/node-termination-ec2-status-rule", "arn:aws:events:us-east-2:${local.account_id}:rule/node-termination-ec2-spot-interruption-rule", "arn:aws:events:us-east-2:${local.account_id}:rule/node-termination-ec2-rebalance-rule" ] } } } ] }) } ``` EventBridge Config: ```hcl module "termination_handler_eventbridge" { source = "terraform-aws-modules/eventbridge/aws" version = "~> 1.14.0" create_bus = false rules = { node-termination-asg-lifecycle = { description = "Capture eks asg lifecycle events." event_pattern = jsonencode({ "source" : ["aws.autoscaling"], "detail-type" : ["EC2 Instance Launch Successful", "EC2 Instance Terminate Successful", "EC2 Instance Launch Unsuccessful", "EC2 Instance Terminate Unsuccessful", "EC2 Instance-launch Lifecycle Action", "EC2 Instance-terminate Lifecycle Action"], "detail" : { "AutoScalingGroupName" : ["eks-Group_A", "eks-Group_B"] } }) enabled = true } node-termination-ec2-status = { description = "Capture ec2 status events" event_pattern = jsonencode({ "source" : ["aws.ec2"], "detail-type" : ["EC2 Instance State-change Notification"] }) enabled = true } node-termination-ec2-spot-interruption = { description = "Capture spot interruption events" event_pattern = jsonencode({ "source" : ["aws.ec2"], "detail-type" : ["EC2 Spot Instance Interruption Warning"] }) enabled = true } node-termination-ec2-rebalance = { description = "Capture ec2 rebalance events" event_pattern = jsonencode({ "source" : ["aws.ec2"], "detail-type" : ["EC2 Instance Rebalance Recommendation"] }) enabled = true } } targets = { node-termination-asg-lifecycle = [ { name = "termination_handler-sqs-life" arn = module.termination_handler_queue.sqs_queue_arn }, ] node-termination-ec2-status = [ { name = "termination_handler-sqs-status" arn = module.termination_handler_queue.sqs_queue_arn }, ] node-termination-ec2-spot-interruption = [ { name = "termination_handler-sqs-int" arn = module.termination_handler_queue.sqs_queue_arn }, ] node-termination-ec2-rebalance = [ { name = "termination_handler-sqs-rebalance" arn = module.termination_handler_queue.sqs_queue_arn }, ] } tags = { Name = "node-termination-handler-bus" Service = "aws-node-termination-handler" } } ```
1
answers
0
votes
8
views
asked 2 months ago

How to invoke a private REST API (created with AWS Gateway) endpoint from an EventBusRule?

I have setup the following workflow: - private REST API with sources `/POST/event` and `/POST/process` - a `VPCLink` to an `NLB` (which points to an `ALB` pointing to a microservice running on `EKS`) - a `VPC endpoint` with DNS name `vpce-<id>-<id>.execute-api.eu-central-1.vpce.amazonaws.com` with `Private DNS enabled` - an EventBridge `EventBus` with a rule that has two targets: 1 `API Destination` for debugging/testing and 1 `AWS Service` which points to my private REST Api on the source `/POST/process` - all required `Resource Policies` and `Roles` - all resources are defined within the same AWS Account The **designed** worflow is as follows: - invoke `POST/event` on the VPC endpoint (any other invocation is prohibited by the `Resource Policy`) with an `event` payload - the API puts the `event` payload to the `EventBus` - the `EventBusRule` is triggered and sends the `event` payload to the `POST/process` endpoint on the private REST API - the `POST/process` endpoint proxies the payload to a microservice running on EKS (via `VPCLink` > `NLB` > `ALB`> `k8s Service`) **What does work** so far: - invoking `POST/event` on the VPC endpoint - putting the `event` payload to the `EventBus` - forwarding the `event` payload to the `API Destination` set up for testing/debugging (it's a temporary endpoint on https://webhook.site) - testing the `POST/event` and `POST/process` integration in the AWS Console (the latter is verified by checking that the `event` payload reaches the microservice on EKS successfully) That is all single steps in the workflow seem to work, and all permissions seem to be set properly. **Whad does not work **is invoking the `POST/process` endpoint from the `EventBusRule`, i.e. invoking `POST/event` does not invoke `POST/process` via the `EventBus`, _although_ the `EventBusRule` was triggered. So my **question** is: **How to invoke a private REST API endpoint from an EventBusRule?** **What I have already tried:** - change the order of the `EventBusRule targets` - create a Route 53 record pointing to the `VPC endpoint` and treat it as an (external) `API Destination` - allow access from _anywhere_ by _anyone_ to the REST API (temporarily only, of course) **Remark on the design:** I create _two_ endpoints (one for receiving an `event`, one for processing it) with an EventBus in between because - I have to expect a delay of several minutes between the `Event Creation/Notification` and the successful `Event Processing` - I expect several hundred `event sources`, which are different AWS and Azure accounts - I want to keep track of all events that _reach_ our API and of their successful _processing_ in one central EventBus and _not_ inside each AWS account where the event stems from - I want to keep track each _failed_ event processing in the same central EventBus with only one central DeadLetterQueue
1
answers
0
votes
9
views
asked 2 months ago
1
answers
0
votes
14
views
asked 2 months ago

EKS Managed Nodegroup with Capacity Reservation in Launch Template through CloudFormation does not use Capacity Reservation.

I am creating a Managed Nodegroup for EKS using CloudFormation. I have an EC2 Launch Template with a `CapacityReservationSpecification` defined. The Launch Template is linked to the Managed Nodegroup using CloudFormation. When the Managed Node Group is initialised the Launch Template is copied with an `eks-***` prefix in the name. The `CapacityReservationSpecification` is not copied to the newly generated Launch Template. Cloud Formation script Example: LaunchTemplate: ``` Resources: LaunchTemplateAux: Type: 'AWS::EC2::LaunchTemplate' Properties: LaunchTemplateData: InstanceType: t3.medium CapacityReservationSpecification: CapacityReservationTarget: CapacityReservationResourceGroupArn: {{reservation_group_arn}} MetadataOptions: HttpPutResponseHopLimit: 2 HttpTokens: optional SecurityGroupIds: - xxxxx LaunchTemplateName: !Sub '${AWS::StackName}Aux' ``` NodeGroup: ``` ManagedNodeGroupAux: Type: 'AWS::EKS::Nodegroup' Properties: AmiType: AL2_x86_64 ClusterName: test-cluster Labels: alpha.eksctl.io/cluster-name: test-cluster alpha.eksctl.io/nodegroup-name: test-ng-aux LaunchTemplate: Id: !Ref LaunchTemplateAux NodeRole: node-instance-role::NodeInstanceRole' NodegroupName: test-nodegroup ScalingConfig: DesiredSize: 1 MaxSize: 2 MinSize: 1 Subnets: - xxx ``` The resulting launch templates are as follows. Obtained using the following command `aws ec2 describe-launch-template-versions --launch-template-id <template-id>` My Launch template Output: ``` { "LaunchTemplateVersions": [ { "LaunchTemplateId": "lt-xx", "LaunchTemplateName": "test-cluster-ngAux", "VersionNumber": 1, "CreateTime": "2022-03-24T12:35:05+00:00", "CreatedBy": "xxx:user/xxx", "DefaultVersion": true, "LaunchTemplateData": { "InstanceType": "t3.medium", "SecurityGroupIds": [ "sg-xxx" ], "CapacityReservationSpecification": { "CapacityReservationTarget": { "CapacityReservationResourceGroupArn": "arn:aws:resource-groups:xxxxx:group/my-group" } }, "MetadataOptions": { "HttpTokens": "optional", "HttpPutResponseHopLimit": 2 } } } ] } ``` Launch template copied by EKS API: ``` { "LaunchTemplateVersions": [ { "LaunchTemplateId": "lt-xxx", "LaunchTemplateName": "eks-xxx", "VersionNumber": 1, "CreateTime": "2022-03-24T12:35:46+00:00", "CreatedBy": "xxx:assumed-role/AWSServiceRoleForAmazonEKSNodegroup/EKS", "DefaultVersion": true, "LaunchTemplateData": { "IamInstanceProfile": { "Name": "xxx" }, "ImageId": "ami-0c37e3f6cdf6a9007", "InstanceType": "t3.medium", "UserData": "xxx", "TagSpecifications": [ { "ResourceType": "volume", "Tags": [ { "Key": "eks:cluster-name", "Value": "test-cluster" }, { "Key": "eks:nodegroup-name", "Value": "test-cluster-ng-aux" } ] }, { "ResourceType": "instance", "SecurityGroupIds": [ "xxx" ], "MetadataOptions": { "HttpTokens": "optional", "HttpPutResponseHopLimit": 2 } } } ] } ```
1
answers
1
votes
8
views
asked 2 months ago

mount failed error in eks for db's pods

while deploying from the helm chart there are 3 pods working which is stateless, however, there are some other pods unable to mount the volumes to the corresponding Postgres & Redis pod. and used the same chart to deploy in other eks cluster and its working as expected. **error logs is below:** mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-0b8a7301623f89c7f --scope -- mount -t ext4 -o defaults /dev/xvdbo /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-0b8a7301623f89c7f Output: Running scope as unit run-21761.scope. mount: /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-0b8a7301623f89c7f: wrong fs type, bad option, bad superblock on /dev/xvdbo, missing codepage or helper program, or other error. Warning FailedMount 2s kubelet MountVolume.MountDevice failed for volume "pvc-cec820c5-e53a-41c1-9a8b-c8b14218f990" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-0b8a7301623f89c7f --scope -- mount -t ext4 -o defaults /dev/xvdbo /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-0b8a7301623f89c7f Output: Running scope as unit run-21972.scope. **pod status:** saleor-588b7c95cd-j8k4k 0/1 Running 12 26m saleor-dashboard-78bdcf6ff9-tlfgg 1/1 Running 0 26m saleor-postgresql-0 0/1 ContainerCreating 0 26m saleor-redis-master-0 0/1 ContainerCreating 0 26m saleor-redis-slave-0 0/1 ContainerCreating 0 26m saleor-storefront-84ff9f4967-l9l4m 1/1 Running 0 26m saleor-worker-6b9887fc47-4rtch 1/1 Running 0 26m
0
answers
0
votes
2
views
asked 2 months ago

CoreDNS with ETCD backend on EKS

CoreDNS has a ETCD plugin https://coredns.io/plugins/etcd/, which essentially allows for dynamic DNS by reading the values from ETCD. Since EKS is managed, it means we can't access the etcd instance on master node, that's fine as I can create my own etcd cluster (and I did). Below is my coredns deployment ``` apiVersion: v1 data: Corefile: | .:53 { errors health log kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } etcd { path /skydns endpoint http://etcd-cluster-ip.default.svc.cluster.local:2379 fallthrough } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance } kind: ConfigMap metadata: annotations: {} labels: eks.amazonaws.com/component: coredns k8s-app: kube-dns name: coredns namespace: kube-system ``` The issue I face now is that the master node is not able to resolve the ClusterIP DNS `etcd-cluster-ip.default.svc.cluster.local`, which is the ClusterIP of my etcd cluster. If I change that DNS with the actual ClusterIP, name resolution works as expected and CoreDNS is able to access ETCD How can the master node resolve the DNS of my cluster ? I see below line in coredns logs ``` {"level":"warn","ts":"2022-03-16T20:44:42.352Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-fd406ba0-cc21-4132-bfef-ca14e3fd4eb3/etcd-cluster-ip.default.svc.cluster.local:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-cluster-ip.default.svc.cluster.local on 10.0.0.2:53: no such host\""} ```
1
answers
0
votes
9
views
asked 2 months ago

Getting an AccessDeniedException when trying to access (read) a DynamoDB table from a completely different AWS account

Hello, I have an application deployed in an EKS cluster in `Account A` that is trying to read an item from a DynamoDB table in `Account B`. I have done the following : * I have created a role in `Account B` called `DynDBReadAccess` with a policy that allows someone to perform the `dynamodb:GetItem` action on the table `arn:aws:dynamodb:us-east-1:<Account B>:table/myTable`. * I then created the role `CrossAccountDynDBAccess` in `Account A` with permissions to perform the `sts:AssumeRole` action and assume the role `arn:aws:iam::<Account B>:role/DynDBReadAccess`. * I updated the trust policy in the `DynDBReadAccess` role to trust the Principal `arn:aws:iam::<Account A>:role/CrossAccountDynDBAccess`. I have a pod with aws cli deployed in the cluster for debugging purposes. Now, I do the following. * I exec into the pod and run `aws sts get-caller-identity`, I see the correct assumed role `CrossAccountDynDBAccess`. * Then I run `aws sts assume-role --role-arn "arn:aws:iam::<Account B>:role/DynDBReadAccess" --role-session-name crossAccountSession` and I get the temporary credentials. * I set the environment variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` and `AWS_SESSION_TOKEN` with the temporary credentials I just received. * I run the following command `aws dynamodb get-item --table-name myTable --key 'somekey'` and I get the following error `An error occurred (AccessDeniedException) when calling the GetItem operation: User: arn:aws:sts::<Account B>:assumed-role/DynDBReadAccess/crossAccountSession is not authorized to perform: dynamodb:GetItem on resource: arn:aws:dynamodb:ap-southeast-1:<Account B>:table/myTable` I thought that once the roles, permissions and trust policies were set, cross account access should be possible. **Can someone tell me what is missing?** Some other points to note * The OIDC endpoint and IRSA role has been enabled in the EKS cluster and the service account for the cluster has been created. * The aws cli pod that I deployed has been deployed with the service account mapped to the IRSA role. * I tried doing the same thing with a lambda function where I created a lambda function in `Account A` that will read an item from the same DynamoDB table. The execution role of the lambda function assumes the `arn:aws:iam::<Account B>:role/DynDBReadAccess` and reads an item from the same DynamoDB table. **This works**.
1
answers
0
votes
18
views
asked 2 months ago

EKS HPA's apiVersion fails to stay at v2beta2

When I deploy my HPA's I am choosing ``apiVersion: autoscaling/v2beta2`` but kubernetes is making them autoscaling/v2beta1 For example: If I deploy this. ``` apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: surething-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: surething minReplicas: 2 maxReplicas: 4 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 100 periodSeconds: 15 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 100 ``` I will get this ``` apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: surething-hpa namespace: dispatch-dev uid: 189cee35-c000-410b-954e-c164a08809e1 resourceVersion: '404150989' creationTimestamp: '2021-04-04T17:30:48Z' labels: app: dispatch deployment: dev microservice: surething annotations:... selfLink:... status:... spec: scaleTargetRef: kind: Deployment name: surething apiVersion: apps/v1 minReplicas: 2 maxReplicas: 4 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 100 ``` All the documentation I can find on EKS and HPA's says that I should be able to use ``apiVersion: autoscaling/v2beta2``. My cluster is version 1.21 and my nodegroup is as well. When I run ``kubectl api-versions`` I can find ``autoscaling/v2beta2`` in the list. I'm at wits end on this one. Can someone tell me what I am doing wrong?
0
answers
0
votes
3
views
asked 2 months ago

AWS EKS - CloudFormation Script fails (just the documented tutorial with no changes)

**Summary**: 1. I have successfully deployed EKS via AWS Cloudformation template in the past (about an year ago). 2. Now when I am trying to deploy EKS via AWS Cloudformation its failing. 3. The error message is NOT clear enough for me to go and fix the reason of the crash, any tips on how to go about this error message? **Documentation and Steps Used** 1. Page: https://aws.amazon.com/quickstart/architecture/amazon-eks/ 2. Deploy using AWS CloudFormation with new VPC **Error Message** | Stack name | Status | | --- | --- | | eks-quickstart-RegionalSharedResources | DELETE_FAILED | | eks-quickstart-AccountSharedResources | CREATE_COMPLETE | | Amazon-EKS | ROLLBACK_COMPLETE | Amazon EKS (ROLLBACK_COMPLETE) has the following events that Failed * AutoDetectSharedResources > CREATE_FAILED with log Embedded stack arn:aws:cloudformation:us-east-2:SOME_ID : stack/Amazon-EKS-AutoDetectSharedResources-SOME_UUID was not successfully created: The following resource(s) failed to create: [ PreReqs ]. * Amazon-EKS > ROLLBACK_IN_PROGRESS with log The following resource(s) failed to create: [AutoDetectSharedResources]. Rollback requested by user. **One more log seems to be important (BUT the Cloudformation Script is from AWS so I doubt it might be a root cause)** ``` RegisterHelmType CREATE_FAILED CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [SOME_UUID]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version. ```
1
answers
0
votes
16
views
asked 2 months ago

AppMesh mTLS - Unable to verify SSL encryption is established using SPIRE

I'm in the process of setting up a prototype mesh with mTLS. I've gotten to the point where I have my services coupled with envoy sidecars and the sidecars are receiving certificates from SPIRE. I've been following along with this [article ](https://aws.amazon.com/blogs/containers/using-mtls-with-spiffe-spire-in-app-mesh-on-eks/) and am now running into an issue. In their steps, they perform a curl command from a container outside of the mesh and get some TLS negotiation messages. When I try to do the same thing, I get the following: ``` bash-4.2# curl -v -k https://grpc-client-service.grpc.svc.cluster.local:80/ * Trying 10.100.152.100:80... * Connected to grpc-client-service.grpc.svc.cluster.local (10.100.152.100) port 80 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt * CApath: none * TLSv1.2 (OUT), TLS header, Certificate Status (22): * TLSv1.2 (OUT), TLS handshake, Client hello (1): * error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol * Closing connection 0 curl: (35) error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol ``` Any advice on where I should start to troubleshoot this issue? Here's a rough overview of my setup: There are two pods that represent a client and a server service. The client has a web interface that allows the user to input text. The client takes the input text and submits that to the server service. The server service responds with an echo message that has some extra formatting so you know it came from the server. I've got both pods wrapped in virtual services that connect directly to virtual nodes. I was able to successfully test this with a basic mesh setup prior to adding the SPIRE workload parameters to the services. Within the envoy sidecars, I can see that the SPIRE server is indeed issuing certificates.
1
answers
0
votes
6
views
asked 2 months ago

Installing Calico on my EKS (k8s version 1.21)

Hi, I tried to install calico following this url on my EKS (k8s version 1.21) https://docs.aws.amazon.com/eks/latest/userguide/calico.html resources in namespace tigera-operator deployed successfuly. but in calico-system, deployment.apps/calico-kube-controllers is failed on deployment and log shows, NAME READY STATUS RESTARTS AGE calico-kube-controllers-6b5d45f4dc-n9tjn 0/1 CreateContainerConfigError 0 3m42s kubectl logs calico-kube-controllers-6b5d45f4dc-n9tjn -n calico-system >>> Error from server (BadRequest): container "calico-kube-controllers" in pod "calico-kube-controllers-6b5d45f4dc-n9tjn" is waiting to start: CreateContainerConfigError kubectl describe deployment.app/calico-kube-controllers -n calico-system >>> Name: calico-kube-controllers Namespace: calico-system CreationTimestamp: Fri, 04 Mar 2022 11:45:52 +0900 Labels: k8s-app=calico-kube-controllers Annotations: deployment.kubernetes.io/revision: 1 Selector: k8s-app=calico-kube-controllers Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable StrategyType: Recreate MinReadySeconds: 0 Pod Template: Labels: k8s-app=calico-kube-controllers Service Account: calico-kube-controllers Containers: calico-kube-controllers: Image: docker.io/calico/kube-controllers:v3.21.4 Port: <none> Host Port: <none> Liveness: exec [/usr/bin/check-status -l] delay=10s timeout=10s period=10s #success=1 #failure=6 Readiness: exec [/usr/bin/check-status -r] delay=0s timeout=10s period=10s #success=1 #failure=3 Environment: KUBE_CONTROLLERS_CONFIG_NAME: default DATASTORE_TYPE: kubernetes ENABLED_CONTROLLERS: node KUBERNETES_SERVICE_HOST: ######## KUBERNETES_SERVICE_PORT: 443 Mounts: <none> Volumes: <none> Priority Class Name: system-cluster-critical Conditions: Type Status Reason ---- ------ ------ Available False MinimumReplicasUnavailable Progressing True ReplicaSetUpdated OldReplicaSets: <none> NewReplicaSet: calico-kube-controllers-6b5d45f4dc (1/1 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 5m41s deployment-controller Scaled up replica set calico-kube-controllers-6b5d45f4dc to 1 please help me to finish the installation>?
0
answers
0
votes
2
views
asked 3 months ago

Access to Secrets Manager from pod in EKS

Hi, I'm trying to access to secrets in Secrets Manager from a pod deployed in EKS cluster. This cluster was created with *eksctl* command. * I attached a iam policy with grants to iam role attached to EC2 nodes: ``` { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "secretsmanager:GetResourcePolicy", "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:ListSecretVersionIds" ], "Resource": "arn:aws:secretsmanager:eu-west-1:[masked]:*" }, { "Effect": "Allow", "Action": "secretsmanager:ListSecrets", "Resource": "*" } ] } ``` * This iam role was created by *eksctl* command, and I see that it has this trust relationship: ``` { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } ``` When I try from awscli, to retrieve a secret from a running pod in EKS cluster, I have this error: ``` # aws secretsmanager get-secret-value --secret-id arn:aws:secretsmanager:eu-west-1:[masked] An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::[masked] is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::[masked] ``` awscli has configured in config file: ``` [default] region = eu-west-1 output = json role_arn = arn:aws:iam::[masked] credential_source = Ec2InstanceMetadata ``` What's wrong? Kind regards
2
answers
0
votes
20
views
asked 3 months ago

What could be causing EKS nodes to fail to join a cluster on a specific account?

I'm running into an unusual issue where, on one specific new AWS account, I cannot create any nodegroups whatsoever. I've tried on two other AWS accounts and I can make nodegroups without any problems. The nodegroup creation always fails with something like the following: NodeCreationFailure Instances failed to join the kubernetes cluster DUMMY_2f2298a2-f492-439a-b7bb-ff931c539d78 DUMMY_5651ecbb-690e-4f3e-bc28-c52dc0d95bca DUMMY_6db1e73c-a1c7-4258-b10d-f6994864c3ef DUMMY_93f8d481-afd5-4811-ae28-aa2c50bd3ef5 DUMMY_950c3c89-d7ef-489d-8023-bc88a3b8a99c DUMMY_a5e09b94-4c86-4d0b-bb12-b9630ee544de DUMMY_bab43e87-11f8-4747-908a-06ae3741c612 DUMMY_c3f7c48a-4138-48d4-ba15-894a33f2d90a DUMMY_cccca0c7-98ae-4bf7-8441-8124971e8a78 DUMMY_d9909a43-ebf5-4340-99f0-47281499b2e2 DUMMY_daa1703a-8032-4fa5-9eae-c8a0b04fc1dd DUMMY_f3d0c7e8-b265-4927-98d4-33f7d4cd5ace This occurs whether I'm using eksctl to create a new cluster from scratch with nodegroups (both when I specify how the nodegroup should be configured, or if I let it use the defaults for the initial nodegroup), if I use eksctl to create a nodegroup on an existing cluster, or if I try to create a nodegroup on an existing cluster using the AWS web client. I've tried all of these things on different accounts and had success every time. I've tried both us-west-1 and us-west-2 and had no success on the affected account, and nothing but success on the other accounts. I looked up common sources of this issue (https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html) and I haven't had success with trying the suggested solutions. The IAM roles that are created with each nodegroup (before they're deleted when the creation fails) look identical to ones on working accounts, and they have the permissions AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, and AmazonEKS_CNI_Policy. I even tried making an IAM role with those three permissions and using that to make a nodegroup through the web client, and it still failed. The VPCs that these clusters are on are configured for IPV4, not IPV6. The VPCs's main security groups have all outbound traffic allowed, and since they've been set up via eksctl, they have two public and two private subnets, with the public subnets having IP addresses auto-assigned, so they should have public internet access. The managed nodegroups created when I spin up a new cluster with eksctl seem to only be trying to use the public subnets, so they should definitely have public access. The account I'm using has AdministratorAccess permission to the account. I'm running out of ideas as to how to solve this. It really seems to be tied to this account, but I can't figure out what's causing this very specific problem.
0
answers
0
votes
8
views
asked 3 months ago

Selectively exposing a REST endpoint publicly in an AWS EKS cluster in a private VPC

**Cluster information:** **Kubernetes version: 1.19** **Cloud being used: AWS EKS** So here is my configuration. I have a private VPC on AWS within which is hosted an AWS EKS cluster. Now this VPC has public facing load balancers which are only accessible from only specific IP addresses. On this EKS cluster are hosted a number of micro services running in their own pods. Each of these pods exposes a REST endpoint. Now here is my requirement. Out of all the REST endpoints that we have, i would like to make only one REST endpoint publicly available from the internet. The remainder of our REST endpoints should remain private accessible only from certain IP addresses. What would be the best approach to achieve this? So far,from what i have researched, here are my options: 1)Have another instance of Ingress controller which deploys a public facing load balancer to handle requests to this public facing REST endpoint. This will work. However, i am concerned with the security aspects here. An attacker might just get into our VPC and create havoc. 2)Have a completely new EKS cluster which is public facing where i deploy this single REST endpoint. This is something i would like to avoid. 3)Use something like AWS API gateway to achieve this. I am not sure if this is possible as i have to research more about it. Anyone has any ideas on how this could be achieved securely? Any advice would be very much appreciated. Regards, Kiran Hegde
5
answers
0
votes
23
views
asked 4 months ago

Assume a service account role in EKS

I have created an EKS cluster using `eksctl`. I am following these steps to establish connectivity to AWS services like S3, cloudwatch using spring-boot. 1. Create EKS using `eksctl` - This has my service account details and OIDC enabled. 2. List the service accounts to see if they were created fine 3. Create a deployment using the account name 4. Create a service I am seeing a 403 in the logs: ``` User: arn:aws:sts:account_id/nodegroup_rule_created_by_eks is not authorized to perform: cloudformation:DescribeStackResources because no identity-based policy allows the cloudformation:DescribeStackResources action (Service: AmazonCloudFormation; Status Code: 403; Error Code: AccessDenied; Request ID: xxxx) ``` Can I get some help here to troubleshoot this issue, please? --- What I have figured out after posting this issue is my node which is provisioned by `eksctl`, has been applied with rules. This is the rule which my app is picking up due to the default CredentialChain. What I haven't still figured out is how do I enable the apps in the pod to assume a service account role. --- Here are relevant snippets from the yaml. #### cluster-config.yaml file: ``` iam: withOIDC: true serviceAccounts: - metadata: name: backend-stage-iam-role namespace: backend-stage labels: { aws-usage: "all-backend-allow" } attachPolicyARNs: - "arn:aws:iam::MY_CUSTOM_RULE_WHICH_ALLOWS_S3_LIST_GET_PUT" ``` #### deployment.yaml ``` spec: replicas: 8 selector: matchLabels: app: my-app strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: app: my-app spec: serviceAccountName: backend-stage-iam-role ``` When describing the pod, I see that there exists an environment variable : ```AWS_ROLE_ARN: arn:aws:iam::MY_CUSTOM_RULE_WHICH_ALLOWS_S3_LIST_GET_PUT``` I am still to figure out how can I apply this role to the pod?
3
answers
0
votes
62
views
asked 4 months ago

EKS Network Load Balancer Service

Hello, I have an EKS cluster (terraform code see below) and follow the guide to set up the Load Balancer Controller (https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html). But when I deploy the service (terraform code see below) and want to expose it via "LoadBalancer" it keeps in a pending state and no external adr. is available. The Load Balancer controller gives the following error: Log Error from eksckubectl logs pod/aws-load-balancer-controller-5b57cdc6cc-dtjbg -n kube-system {"level":"error","ts":1640857282.2362676,"logger":"controller-runtime.manager.controller.service","msg":"Reconciler error","name":"terraform-example","namespace":"default","error":"AccessDenied: User: arn:aws:sts::009661972061:assumed-role/my-cluster2021123008214425030000000b/i-0a40de3c4e8541004 is not authorized to perform: elasticloadbalancing:CreateTargetGroup on resource: arn:aws:elasticloadbalancing:eu-central-1:009661972061:targetgroup/k8s-default-terrafor-630f67813d/* because no identity-based policy allows the elasticloadbalancing:CreateTargetGroup action\n\tstatus code: 403, request id: 2491099a-a6fd-4e6f-bab8-3c758eda0d0b"} If I add the AWSLoadBalancerControllerIAMPolicy to the my-cluster2021123008214425030000000b role manually it works. But as far as I read the documentation the AWSLoadBalancerControllerIAMPolicy is for the controller in the kube-system namespace and not the worker nodes. Is there anything missing from the documentation? Or what is the intended way of solving this? best regards rene Terraform EKS: ``` terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 3.27" } } required_version = ">= 0.14.9" } provider "aws" { profile = "default" region = "eu-central-1" } data "aws_eks_cluster" "eks" { name = module.eks.cluster_id } data "aws_eks_cluster_auth" "eks" { name = module.eks.cluster_id } provider "kubernetes" { host = data.aws_eks_cluster.eks.endpoint cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority[0].data) token = data.aws_eks_cluster_auth.eks.token } module "eks" { source = "terraform-aws-modules/eks/aws" cluster_version = "1.21" cluster_name = "my-cluster" vpc_id = "vpc-xx" subnets = ["subnet-xx", "subnet-xx", "subnet-xx"] worker_groups = [ { instance_type = "t3.medium" asg_max_size = 5 role_arn = "arn:aws:iam::xxx:role/worker-node-example" } ] } ``` Terraform service: ``` terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 3.27" } kubernetes = { source = "hashicorp/kubernetes" version = ">= 2.0.1" } } required_version = ">= 0.14.9" } provider "kubernetes" { host = "xxx" cluster_ca_certificate = base64decode("xxx") exec { api_version = "client.authentication.k8s.io/v1alpha1" command = "aws" args = [ "eks", "get-token", "--cluster-name", "my-cluster" ] } } provider "aws" { profile = "default" region = "eu-central-1" } resource "aws_sqs_queue" "gdpr_queue" { name = "terraform-example-queue.fifo" fifo_queue = true content_based_deduplication = true sqs_managed_sse_enabled = true } resource "aws_sqs_queue" "private_data_queue" { name = "terraform-example-queue.fifo" fifo_queue = true content_based_deduplication = true sqs_managed_sse_enabled = true } resource "aws_db_instance" "database" { allocated_storage = 10 engine = "postgres" engine_version = "13.3" instance_class = "db.t3.micro" name = "mydb" username = "foo" password = "foobarbaz" skip_final_snapshot = true vpc_security_group_ids = [aws_security_group.basic_security_group.id] } resource "aws_security_group" "basic_security_group" { name = "allow rds connection" description = "Allow rds traffic" vpc_id = "vpc-xxx" ingress { description = "postgres" from_port = 5432 to_port = 5432 protocol = "all" cidr_blocks = ["0.0.0.0/0"] ipv6_cidr_blocks = ["::/0"] } } resource "kubernetes_service" "gdpr-hub-service" { metadata { name = "terraform-example" annotations = { "service.beta.kubernetes.io/aws-load-balancer-type" = "external" "service.beta.kubernetes.io/aws-load-balancer-nlb-target-type" = "ip" "service.beta.kubernetes.io/aws-load-balancer-scheme" : "internet-facing" } } spec { selector = { App = kubernetes_deployment.gdpr-hub-service-deployment.spec.0.template.0.metadata.0.labels.App } session_affinity = "ClientIP" port { port = 80 target_port = 8080 } type = "LoadBalancer" } } resource "kubernetes_deployment" "gdpr-hub-service-deployment" { depends_on = [ aws_db_instance.database, aws_sqs_queue.gdpr_queue, aws_sqs_queue.private_data_queue ] metadata { name = "gdpr-hub-service" labels = { App = "gdpr-hub-service" } } spec { replicas = 2 selector { match_labels = { App = "gdpr-hub-service" } } template { metadata { labels = { App = "gdpr-hub-service" } } spec { container { image = "xxxx" name = "gdpr-hub-service" port { container_port = 8080 } resources { limits = { cpu = "2" memory = "1024Mi" } requests = { cpu = "250m" memory = "50Mi" } } } } } } } ```
2
answers
0
votes
10
views
asked 5 months ago

kube-proxy failing after update to 1.16+

Hi all, I've recently updated one of our clusters from version 1.15 to 1.16 and then to 1.17. Before this one I updated 8 other clusters with no issues whatsoever. However, for some reason, when I update _kube-proxy_ to bring it in line with the new Kubernetes version the pods fail. The logs are empty and the reason why the pods are terminated is "Error", which isn't informative at all. In short: - Server Version: v1.17.17-eks-087e67 - Nodes version: v1.17.17-eks-ac51f2 - Working kube-proxy version: v1.15.11-eksbuild.1 - kube-proxy versions that cause the problem: 1.16.13-eksbuild.1, v1.17.9-eksbuild.1 At first I thought it could just be that particular version of kube-proxy so I decided to keep updating. Now I assume it's something else. Running _kubectl logs -f podName_ doesn't help. It doesn't return anything. State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Mon, 18 Oct 2021 12:32:56 +0200 Finished: Mon, 18 Oct 2021 12:32:56 +0200 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 66s default-scheduler Successfully assigned kube-system/kube-proxy-bh7r4 to ip-xxxxxx.eu-west-1.compute.internal Normal Pulling 65s kubelet, ip-xxxxxx.eu-west-1.compute.internal Pulling image "602401143452.dkr.ecr.eu-west-1.amazonaws.com/eks/kube-proxy:v1.17.9-eksbuild.1" Normal Pulled 63s kubelet, ip-xxxxxx.eu-west-1.compute.internal Successfully pulled image "602401143452.dkr.ecr.eu-west-1.amazonaws.com/eks/kube-proxy:v1.17.9-eksbuild.1" Normal Created 23s (x4 over 62s) kubelet, ip-xxxxxx.eu-west-1.compute.internal Created container kube-proxy Normal Started 23s (x4 over 62s) kubelet, ip-xxxxxx.eu-west-1.compute.internal Started container kube-proxy Normal Pulled 23s (x3 over 62s) kubelet, ip-xxxxxx.eu-west-1.compute.internal Container image "602401143452.dkr.ecr.eu-west-1.amazonaws.com/eks/kube-proxy:v1.17.9-eksbuild.1" already present on machine Warning BackOff 8s (x6 over 61s) kubelet, ip-xxxxxx.eu-west-1.compute.internal Back-off restarting failed container Can you please advise? I'm quite confused here. I'm comparing what I did with our other clusters and I took the exact same steps. Not sure why this is not working. Thanks a lot! Edited by: twgdavef on Oct 18, 2021 4:24 AM
1
answers
0
votes
1
views
asked 7 months ago

Kubernetes projected service account token expiry time issue

I'm using AWS EKS 1.21 with service account discovery enabled. Created an OIDC provider, the `.well-known/openid-configuration` endpoint returns a correct configuration: _{_ _"issuer": "https://oidc.eks.eu-west-1.amazonaws.com/id/***",_ _"jwks_uri": "https://ip-***.eu-west-1.compute.internal:443/openid/v1/jwks",_ _"response_types_supported": \[_ _"id_token"_ _],_ _"subject_types_supported": \[_ _"public"_ _],_ _"id_token_signing_alg_values_supported": \[_ _"RS256"_ _]_ _}_ Created a ServiceAccount for one of my deployments and the pod gets this as projected volume: _volumes:_ _- name: kube-api-access-b4xt9_ _projected:_ _defaultMode: 420_ _sources:_ _- serviceAccountToken:_ _expirationSeconds: 3607_ _path: token_ _- configMap:_ _items:_ _- key: ca.crt_ _path: ca.crt_ _name: kube-root-ca.crt_ _- downwardAPI:_ _items:_ _- fieldRef:_ _apiVersion: v1_ _fieldPath: metadata.namespace_ _path: namespace_ The secret created for the ServiceAccount contains this token: _{_ _"iss": "kubernetes/serviceaccount",_ _"kubernetes.io/serviceaccount/namespace": "sbx",_ _"kubernetes.io/serviceaccount/secret.name": "dliver-site-config-service-token-kz874",_ _"kubernetes.io/serviceaccount/service-account.name": "dliver-site-config-service",_ _"kubernetes.io/serviceaccount/service-account.uid": "c26ad760-9067-4d90-a327-b3d6e32bce42",_ _"sub": "system:serviceaccount:sbx:dliver-site-config-service"_ _}_ The projected token mounted in to the pod contains this: _{_ _"aud": \[_ _"https://kubernetes.default.svc"_ _],_ _"exp": 1664448004,_ _"iat": 1632912004,_ _"iss": "https://oidc.eks.eu-west-1.amazonaws.com/id/***",_ _"kubernetes.io": {_ _"namespace": "sbx",_ _"pod": {_ _"name": "dliver-site-config-service-77494b8fdd-45pxw",_ _"uid": "0dd440a6-1213-4faa-a69e-398b83d2dd6b"_ _},_ _"serviceaccount": {_ _"name": "dliver-site-config-service",_ _"uid": "c26ad760-9067-4d90-a327-b3d6e32bce42"_ _},_ _"warnafter": 1632915611_ _},_ _"nbf": 1632912004,_ _"sub": "system:serviceaccount:sbx:dliver-site-config-service"_ _}_ Kubernetes renew the projected token every hour, so everything looks fine. Except the projected token "exp" field: _"iat": 1632912004_ which is _Wednesday, September 29, 2021 10:40:04 AM_ _"exp": 1664448004_ which is _Thursday, September 29, 2022 10:40:04 AM_ So the problem is, that the projected token expiry time is 1 year, instead of around 1 hour, which makes Kubernetes effort to renew the token basically useless. I searched for hours but was simply unable to figure out where this is coming from. The expiration flag is passed to the kube-api server: _--service-account-max-token-expiration="24h0m0s"_, so my assumption is that this should be configured on the OIDC provider somehow, but unable to find any related documentation. Any idea how to make the projected token expiry date around the same as the _expirationSeconds_ in the pod projected volume?
2
answers
0
votes
4
views
asked 8 months ago

PVC are in Pending state that are provisioned using CSI driver

I can see the below logs describing PVC: Warning ProvisioningFailed 93s (x2 over 93s) persistent volume-controller storageclass.storage.k8s.io "ebs-sc" not found Normal Provisioning 9s (x6 over 90s) ebs.csi.aws.com_ebs-csi-controller-f5d9c9475-wh2t9_e2eea260-a1f6-4b74-9250-baf43ba03780 External provisioner is provisioning volume for claim "default/ebs-claim" Normal ExternalProvisioning 3s (x8 over 90s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator Warning ProvisioningFailed 0s (x6 over 80s) ebs.csi.aws.com_ebs-csi-controller-f5d9c9475-wh2t9_e2eea260-a1f6-4b74-9250-baf43ba03780 failed to provision volume with StorageClass "ebs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded ------------------------------------------------------------------------------------- kubectl logs ebs-csi-controller-f5d9c9475-wh2t9 -c csi-provisioner -n kube-system: CreateVolume failed, supports topology = true, node selected true => may reschedule = true => state = Background: rpc error: code = DeadlineExceeded desc = context deadline exceeded I0624 09:55:16.287191 1 controller.go:1106] Temporary error received, adding PVC 3f6ec7b5-2a25-4f42-babb-d808dd464535 to claims in progress W0624 09:55:16.287200 1 controller.go:958] Retrying syncing claim "3f6ec7b5-2a25-4f42-babb-d808dd464535", failure 8 E0624 09:55:16.287222 1 controller.go:981] error syncing claim "3f6ec7b5-2a25-4f42-babb-d808dd464535": failed to provision volume with StorageClass "ebs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded I0624 09:55:16.287251 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ebs-claim", UID:"3f6ec7b5-2a25-4f42-babb-d808dd464535", APIVersion:"v1", ResourceVersion:"6828149", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ebs-sc": rpc error: code = DeadlineExceeded desc = context deadline exceeded ------------------------------------------- I can't see the right error for this issue to troubleshoot.
2
answers
0
votes
29
views
asked a year ago
1
answers
0
votes
3
views
asked a year ago

timout when updating/reading crds

--------------------------- Cluster information: Kubernetes version: 1.18 AWS EKS We have a small (3-node) AWS cluster that we’ve been working on for a few months. It’s been running ArgoCD 1.7.10, and we recently tried to upgrade that to 1.8.7. That upgrade failed - the command didn’t return, it just hung. Since then attempting to apply CRDs has just caused a timeout. To try to work this out We’ve extracted the application CRD from https://raw.githubusercontent.com/argoproj/argo-cd/v1.8.7/manifests/install.yaml and trying to apply that goes like this: $ kubectl -n argocd apply -f application-crd.yaml Error from server (Timeout): error when creating "application-crd.yaml": the server was unable to return a response in the time allotted, but may still be processing the request (post customresourcedefinitions.apiextensions.k8s.io) We can’t find anything in any logs and we suspect that the problem lies in etcd which is invisible to us but cannot be sure. We have tried applying other CRD's, some more basic to prove the point and some just as complex. We have trimmed down the argoo CRD yaml and removed the schema and other parts to see which parts are failing but there appears to be no rhyme or reason to what is causing the problem. Sometimes it fails and sometimes it succeeeds. We have also tried the sample CRD from here: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/ And seen it both work and fail. In short, our kube is behaving erratically and we suspect a bug or a fundamental problem that we cannot diagnose. We are stuck and this is going to be used in production but we are losing confidence in our AWS EKS kube. Where do we go from here? Edited by: kophones77 on Mar 25, 2021 2:31 AM
1
answers
0
votes
2
views
asked a year ago
  • 1
  • 90 / page