Browse through the questions and answers listed below or filter and sort to narrow down your results.
EKS adot to AWS Managed Prometheus remote write in another account
I followed the tutorial here (https://docs.aws.amazon.com/eks/latest/userguide/deploy-deployment.html) and it works well when the prometheus remote write is on the same account than the EKS Now I have a separate account for EKS and Managed Prometheus, so I need to assume a role to be able to write to the prometheus remote write. I used this yaml https://raw.githubusercontent.com/aws-observability/aws-otel-community/master/sample-configs/operator/collector-config-amp.yaml And I modified the extensions/sigv4auth like this : ``` apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: my-collector-amp spec: mode: deployment serviceAccount: adot-collector podAnnotations: prometheus.io/scrape: 'true' prometheus.io/port: '8888' config: | extensions: sigv4auth: assume_role: arn: "arn:aws:iam::1234567890:role/prometheus_remote_write_assumerole" region: "us-west-2" service: "aps" ``` Got that error : Error: failed to get config: invalid configuration: extension "sigv4auth" has invalid configuration: bad AWS credentials 2022/09/08 13:57:32 application run finished with error: failed to get config: invalid configuration: extension "sigv4auth" has invalid configuration: bad AWS credentials
What happens to EFS based PVs when a node crashes?
I have some applications which uses dynamically provisioned PVs with EFS (EFS CSI dynamic provisioning). I have an EKS cluster with managed node groups in different AZs. My question: what happens to these PVs when a node crashes for some reason and the pods are restarted on other nodes? Will EKS or k8s automatically remount these EFS based PVs to the proper pods?
Link RoboMaker ROS nodes with EKS
I'm developing a simulated multi robot fleet management algorithm. The robots are simulated via different RoboMaker instances, while the multi robot management algorithm is deployed into a K8s pod in EKS. Is there a service available to link the ROS nodes in RoboMaker to a cluster's pod in EKS?
aws-node Daemonset (AWS EKS v1.21) with strange readiness timeoutSeconds
We have here 2 EKS clusters and both sometimes appear in events, readiness Probe failure, from any aws-node Pod. Looking for the Daemonset manifest we have: ``` livenessProbe: exec: command: - /app/grpc-health-probe - '-addr=:50051' - '-connect-timeout=2s' - '-rpc-timeout=2s' initialDelaySeconds: 60 timeoutSeconds: 5 periodSeconds: 10 successThreshold: 1 failureThreshold: 3 readinessProbe: exec: command: - /app/grpc-health-probe - '-addr=:50051' - '-connect-timeout=2s' - '-rpc-timeout=2s' initialDelaySeconds: 1 timeoutSeconds: 1 periodSeconds: 10 successThreshold: 1 failureThreshold: 3 ``` how can you see the livenessProbe exec command it has 2 timeouts, one for connection and the other for rpc, and the probe timeoutSeconds is 5 (easialy would be the amount value of exec commands plus 1 second). Now looking for readinessProbe we have the same exec command from livenessProbe but the timeoutSeconds is only 1 second. If you check in the EKS Services probably will not find a service linked with these Pods. So it does not affect any service. Anyway are error messages that could be out from our logs, that don´t make sense, for now, to have. I guess a simple fix in readiness timeoutSeconds to 5 (like liveness) in this Deamonset would be enough. Did anyone have this problem or think of resolving it in this way?
Rancher - Unable to import clusters
I have installed rancher with helm chart approach on EKS cluster using this link. https://rancher.com/docs/rancher/v2.6/en/installation/install-rancher-on-k8s/#install-the-rancher-helm-chart with Rancher Generated TLS Certificate and manually had to edit Ingress to add IngressClass:nginx. As a pre-req, I also deployed ingress controller using Step 5, https://rancher.com/docs/rancher/v2.6/en/installation/resources/k8s-tutorials/amazon-eks/. I did not setup route53 , but using LoadBalancer Domain as is to login to rancher. Am able to login rancher, but unable to create/import existing clusters. Goes into provision mode and on Condition tabs shows "But get message <[Disconnected] Cluster agent is not connected", "Waiting For API to be available", " [Error] Error while applying agent YAML, it will be retried automatically: exit status 1, Unable to connect to the server: remote error: tls: internal error"
EKS Cluster was create Security Group and don't cleanup this SG after destroy
About two weeks ago we found that CFN manifest after delete can not removed VPC. I've checked that and it turned out that the EKS cluster don't removed Security Group which self created. Security group has naming "eks-cluster-sg-EKS-*" with description "EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloads." How I can fix that? For reproduce that you need to deploy VPC with EKS by CFN or using AWS QSS solution. Thanks
Amazon EKS node group Spot instance
I've updated a managed node group from on-demand to SPOT and since I'm getting some network issue : mountVolume. SetUp failed to volume "web-krups-config": failed to sync configmap cache : timed out waiting for the condition MoutVolume.SetUp failed to volume "nri-integrations-cfg-volume" : object "newrelic"/"newrelic-bundle-nrk8s-integrations-cfg" not registered network is not ready: container runtime network not ready : NetworkReady=false reason:docker : network plugin is not ready : cni config uninitialized Any idea or best practices to follow to understand the root cause & fix this issue ? Pods on this node group seems to be in running state ~15 minute and move to pending state.
OKTA EKS Atlassian suite integration
Hello, We currently have Atlassian suite (Jira, confluence, crowd etc) on EC2 instances and RDS and we use Okta’s Single Sign-On (SSO) to enables access to Atlassian products. We have now migrated the Atlassian products to AWS EKS cluster. My question is how do we configure to be able to add every Atlassian product from EKS to OKTA. We are wondering if we can automate the integration using Terraform/Helm because the EKS cluster has been designed using Terraform/Helm. Many thanks in advance
unable to use hdfs on EKS virtual cluster
Hi, i am trying EMR on eks to run our ETL work load. One of the work load is failed because of the hdfs. I am using 6.2.0 Spark EMR container. When i am trying to access hdfs on this image seeing `/usr/bin/hdfs: line 8: /usr/lib/hadoop-hdfs/bin/hdfs: No such file or directory ` error. Basically executable file is missing from the location. I have copied the executor from the EMR instance (with same release label). But now i am seeing ```ls: Incomplete HDFS URI, no host: hdfs:///```
EKS control plane didn't recover after AWS outage
We had EC2 instances and it seems our control plane impacted by a euw2-az1 outage. AWS indicated the issue was resolved but our control plane carried on giving intermittent access denied errors and failing to connect to worker nodes. We're in 3 zones and the control place should be highly available but it seems at least 1 node in the control plance was behaving badly. Is there a way to restart the services in the control plane? We ended up fixing this by upgrading the kubernetes version, but this won't always be an option available.