Questions tagged with Containers
Content language: English
Sort by most recent
I am using SageMaker Notebooks and have added a lifecycle configuration to shut it down after inactivity. However, after a period of inactivity, it takes a long time (20 minutes) to boot back up. What can I do to reduce the launch time?
I have considered creating [a custom container image](https://aws.amazon.com/blogs/machine-learning/bringing-your-own-custom-container-image-to-amazon-sagemaker-studio-notebooks/) with pre-installed packages and dependencies but I am not sure if this would help. I also wonder if using a larger instance size would reduce launch time.
What are the strategies and best practices to reduce launch time for SageMaker Notebooks?
Hello,
We have been running workloads on ECS using EC2 and capacity providers as the backing. All of the tasks have been provided the same amount of CPU and memory. All of the tasks have been run on the same type of EC2 instance type that have enough space for these tasks. The tasks are run with enough memory and cpu that other tasks shouldn't be placed on these same EC2 instances. There are a few CPUs and a few GBs of memory left so we aren't completely maxing out the EC2 instances resources.
Occasionally, we get an error (example below) when trying to start a task. It has only happened a handful of times out of hundreds of tasks that have been run. Based on [the example in this documentation that shows "RESOURCE:CPU"](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html) it seems like the EC2 instance that the task is being placed on doesn't have enough CPU. But given our current setup of one task per machine how would that be possible? Does anyone have ideas as to what might be going on or things we could change on our end to fix or mitigate this?
Example error:
```
[{"Arn":"arn:aws:ecs:REGION:ACCOUNT-ID:container-instance/CONTAINER-INSTANCE-ID","Reason":"RESOURCE:CPU"}] (Service: AmazonECS; Status Code: 400; Error Code: AmazonECS.Unknown; Request ID: UUID; Proxy: null)
```
I have a CodeBuild job that tries to build a docker container based off amazonlinux:2022 it works great locally but fails in CodeBuild with the following error
FROM public.ecr.aws/amazonlinux/amazonlinux:2022
...do a yum -y install then
Amazon Linux 2022 repository 0.0 B/s | 0 B 00:00
Errors during downloading metadata for repository 'amazonlinux':
- Curl error (6): Couldn't resolve host name for https://cdn.amazonlinux.com/al2022/core/mirrors/2022.0.20221207/x86_64/mirror.list [getaddrinfo() thread failed to start]
Error: Failed to download metadata for repo 'amazonlinux': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for https://cdn.amazonlinux.com/al2022/core/mirrors/2022.0.20221207/x86_64/mirror.list [getaddrinfo() thread failed to start]
Ignoring repositories: amazonlinux
How can I get dns to work in code build so that it can acess the amazonlinux:2022 repositories on cdn.amazonlinux.com ?
I have a Fargate task which I'm trying to mount an EFS filesystem on for a Wordpress stack. I've setup an IAM role for the task and declared it in the task definition[1] for both `taskRoleArn` and `executionRoleArn`. The role defines several allowed actions[2] that I've gotten from various pieces of documentation. Also in the task definition, I tried to define (via Terraform) the `Volumes:[]` and `mountPoints:[]` too, however the task was not able to mount the EFS volume and it would fail. I removed the mount points and volumes from the task definition so it would start and then I could shell into (via SSM agent) the running container to try and debug things.
The EFS filesystem has a File System Policy[3] applied and two Mount Targets[4] configured for my `efs-security-group`[5] (allows TCP 2049 ingress, and all egress).
When I attempt[6] to mount the EFS filesystem on the Fargate container, I get `Operation not permitted`. The [Attach] button in the EFS console is where I got the mount command from but negated using `sudo` since I'm already running the mount command as `root`.
I should mention my container uses an init wrapper to start a couple services before launching Apache in the foreground. The `efs` mount command (again, from the [Attach] button in the EFS console) may suggest this is a problem[7]?
Any idea why the NFS mount is failing with Operation not permitted or how to get the `efs` mount to work with an init wrapper script?
[1]
```
taskRoleArn : arn:aws:iam::123123123123:role/webhost-iam-role
executionRoleArn: arn:aws:iam::123123123123:role/webhost-iam-role
```
[2]
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticfilesystem:ClientMount",
"elasticfilesystem:ClientRootAccess",
"elasticfilesystem:ClientWrite",
"elasticfilesystem:DescribeMountTargets"
],
"Resource": "*"
}
]
}
```
[3]
```
{
"Version": "2012-10-17",
"Id": "ExamplePolicy01",
"Statement": [
{
"Sid": "ExampleStatement01",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": [
"elasticfilesystem:ClientMount",
"elasticfilesystem:ClientWrite",
"elasticfilesystem:ClientRootAccess",
"elasticfilesystem:DescribeMountTargets"
],
"Resource": "*"
}
]
}
```
[4]
```
us-east-1a, 10.100.1.63, efs-security-group
us-east-1b, 10.100.2.171, efs-security-group
```
[5]
```
NFS, TCP, [10.100.1.0/24, 10.100.2.0/24], 2049
```
[6]
```
# mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport 10.100.1.63:/ /mnt/webfiles/
mount.nfs4: Operation not permitted
```
[7]
```
# mount -t efs -o tls fs-0328b5ef212381290:/ /mnt/webfiles/
Could not start amazon-efs-mount-watchdog, unrecognized init system "init_wrapper.sh"
b'mount.nfs4: Operation not permitted'
```
I am getting exec /bin/sh: exec format error while trying to deploy a basic Hello World container as ECS service using Fargate. Following are the steps I followed:
1. Build the docker image using docker desktop for Mac
2. Created the ECR repository using AWS console
3. Pushed the docker image to ECR
4. Created a task definition in ECS with Fargate as lunch type
5. Tried to deploy ECS task as service
Here is the docker file I used to build the image:
FROM ubuntu:18.04
#Install dependencies
RUN apt-get update && \
apt-get -y install apache2
#Install apache and write hello world message
RUN echo 'Hello World!' > /var/www/html/index.html
#Configure apache
RUN echo '. /etc/apache2/envvars' > /root/run_apache.sh && \
echo 'mkdir -p /var/run/apache2' >> /root/run_apache.sh && \
echo 'mkdir -p /var/lock/apache2' >> /root/run_apache.sh && \
echo '/usr/sbin/apache2 -D FOREGROUND' >> /root/run_apache.sh && \
chmod 755 /root/run_apache.sh
EXPOSE 80
While troubleshooting the error I have tried to put #!/bin/sh as the first line in the docker file but that also did not work. I have tried to change the image from apache to NGINX and used different docker file like below:
FROM nginx
RUN rm /etc/nginx/conf.d/*
COPY hello.conf /etc/nginx/conf.d/
COPY index.html /usr/share/nginx/html/
Using this image I am getting exec /docker-entrypoint.sh: exec format error
I'm currently trying to deploy a Vue App to ECS. (with EC2)
However, when executing the task, it was confirmed that "Essential container in task exited" was the cause of the interruption.
Additionally, it returns exit code 1 for details.
To check more detailed error logs, get the running container id through the docker-ps command, and docker logs <container-id> | I checked the log by entering the command head.
The error log confirmed that msg="CREDENTIALS_FETCHER_HOST_DIR not found, err: stat /var/credentials-fetcher/socket/credentials_fetcher.sock: no such file or directory" module=parse_linux.go.
Can you identify the cause of the problem and how to solve it?
Hi team,
i'm trying to create AWS code build using this example :
https://aws.amazon.com/blogs/containers/creating-container-images-with-cloud-native-buildpacks-using-aws-codebuild-and-aws-codepipeline/
it finish always with error on this command at the end of the buildspec file:
```
./pack build --no-color --builder $builder \
--tag $IMAGE_TAG $ECR_REPOSITORY:latest \
--cache-image $ECR_REPOSITORY:cache \
--publish
```
I have this error :
> ERROR: failed to : ensure registry read access to 111111111.dkr.ecr.region.amazonaws.com/myrepo:latest
> ERROR: failed to build: executing lifecycle: failed with status code: 1
not sure what I did wrong? I tried to follow the blog's buildspec as is
I already added the required ECR permissions to the code build service role
update:
I added the adminAccess to the code build service role to see if it's a permissions issue
now I have this error :
```
===> ANALYZING
Restoring data for SBOM from previous image
===> DETECTING
ERROR: No buildpack groups passed detection.
ERROR: Please check that you are running against the correct path.
ERROR: failed to detect: no buildpacks participating
ERROR: failed to build: executing lifecycle: failed with status code: 20
[Container] Command did not exit successfully ./pack build --no-color --builder $builder \
--tag $IMAGE_TAG $REPOSITORY_URI:latest \
--cache-image $REPOSITORY_URI:cache \
--publish
exit status 1
```
basically I just want to generate the docker image of my spring boot appli in buildspec.yml without using a docker file,
if there is any other method then using packbuikders?
I appreciate your help on this
Cheers,
I have trained a few models in sagemaker however I am unable to load them for prediction.
I am picking model details from: Sagemaker > Inference > Models > Container 1 section:
Image_uri = value in image
model_data = Value in model data location
then passing these values into sagemaker Model function.
When I deploy this model, it gives error: ping health check failed for AllTraffic production variant. This error doesn't come when I train a new model and deploy it.
Hi,
I'm trying to run Greengrass v2 2.9.0 in a Docker Container which have been setup using [aws-greengrass-docker](https://github.com/aws-greengrass/aws-greengrass-docker).
While setting `aws.iot.SiteWiseEdgeCollectorOpcua` with version `2.1.3` I'm getting error in logs which says `Unable to unpack AWS CRT lib: java.io.IOException: Unable to open library in jar for AWS CRT: /linux/armv8/libaws-crt-jni.so.` (Added full logs in last.)
Here is the **versions** Information:
- `aws.iot.SiteWiseEdgeCollectorOpcua`=> `2.1.3`
- `Greengrass V2`=> `2.9.0`
- `Docker` =>
```
Client:
Cloud integration: v1.0.29
Version: 20.10.21
API version: 1.41
OS/Arch: darwin/arm64
Server: Docker Desktop 4.15.0 (93002)
Engine:
Version: 20.10.21
API version: 1.41 (minimum version 1.12)
OS/Arch: linux/arm64
containerd:
Version: 1.6.10
runc:
Version: 1.1.4
docker-init:
Version: 0.19.0
```
- `Host OS`: `MacOS 13.1`
Following are the **Logs** available for `aws.iot.SiteWiseEdgeCollectorOpcua`
```
2023-01-18T13:53:32.246Z [INFO] (pool-2-thread-17) aws.iot.SiteWiseEdgeCollectorOpcua: shell-runner-start. {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING, command=["java -cp /greengrass/v2/packages/artifacts/aws.iot.SiteWiseEdgeCollectorOpcua/..."]}
2023-01-18T13:53:32.249Z [DEBUG] (pool-2-thread-17) aws.iot.SiteWiseEdgeCollectorOpcua: Created process with pid 464. {serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.407Z [INFO] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stdout. WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.. {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.659Z [INFO] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stdout. [INFO ] 2023-01-18 13:53:32.658 [main] OpcUaCollector - {"message":"Initializing OPC-UA Collector Component."}. {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. Unable to unpack AWS CRT lib: java.io.IOException: Unable to open library in jar for AWS CRT: /linux/armv8/libaws-crt-jni.so. {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. java.io.IOException: Unable to open library in jar for AWS CRT: /linux/armv8/libaws-crt-jni.so. {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at software.amazon.awssdk.crt.CRT.extractAndLoadLibrary(CRT.java:155). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at software.amazon.awssdk.crt.CRT.loadLibraryFromJar(CRT.java:220). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at software.amazon.awssdk.crt.CRT.<clinit>(CRT.java:33). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at software.amazon.awssdk.crt.CrtResource.<clinit>(CrtResource.java:104). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.common.factory.GreengrassFactory.getSocketOptionsForIPC(GreengrassFactory.java:105). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.common.factory.GreengrassFactory.provideEventStreamRpcConnection(GreengrassFactory.java:47). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.common.factory.GreengrassFactory.provideGreengrassCoreIPCClient(GreengrassFactory.java:38). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.component.ComponentService.<init>(ComponentService.java:57). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.665Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.component.ComponentService$Builder.build(ComponentService.java:394). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.collector.OpcUaCollector.main(OpcUaCollector.java:23). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. Exception in thread "main" java.lang.ExceptionInInitializerError. {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at software.amazon.awssdk.crt.CrtResource.<clinit>(CrtResource.java:104). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.common.factory.GreengrassFactory.getSocketOptionsForIPC(GreengrassFactory.java:105). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.common.factory.GreengrassFactory.provideEventStreamRpcConnection(GreengrassFactory.java:47). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.common.factory.GreengrassFactory.provideGreengrassCoreIPCClient(GreengrassFactory.java:38). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.component.ComponentService.<init>(ComponentService.java:57). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.sdk.component.ComponentService$Builder.build(ComponentService.java:394). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at com.amazon.iot.sitewise.component.collector.OpcUaCollector.main(OpcUaCollector.java:23). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. Caused by: software.amazon.awssdk.crt.CrtRuntimeException: software.amazon.awssdk.crt.CrtRuntimeException: Unable to unpack AWS CRT library UNKNOWN(-1) UNKNOWN(-1). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.666Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at software.amazon.awssdk.crt.CRT.loadLibraryFromJar(CRT.java:230). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.667Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. at software.amazon.awssdk.crt.CRT.<clinit>(CRT.java:33). {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.667Z [WARN] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: stderr. ... 7 more. {scriptName=services.aws.iot.SiteWiseEdgeCollectorOpcua.lifecycle.Startup.Script, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
2023-01-18T13:53:32.690Z [INFO] (Copier) aws.iot.SiteWiseEdgeCollectorOpcua: Startup script exited. {exitCode=1, serviceName=aws.iot.SiteWiseEdgeCollectorOpcua, currentState=STARTING}
```
Hello!
# Short summary of context and issue
I am using EFS to mount a PV (ReadWriteMany [access-mode](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes)) via a PVC into EKS pods. The issue I'm having is that write updates propagate with big delays across pods: one pod may successfully write a file to the shared directory, but other pods see it some 10-60 seconds later (this delay varies across experiments seemingly at random).
## Experiment & Concrete results
I run two simple pods. [Pod1](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/debugging_pods/pod1.yaml) runs first and continuously checks if `/workdir/share_point/example.txt` exists via the `stat` command. [Pod2](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/debugging_pods/pod2.yaml) runs second and writes the file, then does the same checks. As can be seen from the logs below, the file created at `16:52:36.544` is visible in Pod1 only at ~`16:52:57.694`
Logs of Pod1: [pod1.log](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/logs/pod1.log)
Logs of Pod2: [pod2.log](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/logs/pod2.log)
## Expected results
I expected that Pod1 sees the file as soon as it is successfully written, as is the case for Pod2. As far as I understand, this would fit the [consistency](https://docs.aws.amazon.com/efs/latest/ug/how-it-works.html#consistency) model described in the docs.
## Worth Mentioning
If I manually `kubectl exec` into the pods and attempt something similar, the problem seems to not be there, see [manual_test.log](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/logs/manual_test.log)
1. Pod2: `echo "Manual test" > /workdir/share_point/manual_test.txt`
2. Pod1: `date +\"%T.%3N\" && stat /workdir/share_point/manual_test.txt`
## Steps to Reproduce
In what follows, I provide the simplest setup that reproduces the issue that I have. Following AWS docs, I set up a VPC, an EKS cluster and an EFS as a storage provider for the cluster. Each section below refers to the documentation I've followed and provides the commands used.
### VPC
Follows [creating-a-vpc](https://docs.aws.amazon.com/eks/latest/userguide/creating-a-vpc.html). Creates a VPC from a template, will have 2 private and 2 public subnets with suitable configuration to host an EKS cluster.
```sh
aws cloudformation create-stack --stack-name public-private-subnets \
--template-url https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/amazon-eks-vpc-private-subnets.yaml
```
### EKS cluster
Follows [create-cluster](https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html). I specify the cluster name, region, and for simplicity manually copy the subnet IDs of the above VPC.
```sh
eksctl create cluster --name my-demo-cluster --region eu-central-1 \
--with-oidc --version 1.24 --node-ami-family Ubuntu2004 \
--vpc-private-subnets private_subnet1_id,private_subnet2_id \
--vpc-public-subnets public_subnet1_id,public_subnet2_id \
--node-private-networking --managed
```
### EFS setup
Follows the [efs-csi-page](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html)
#### Create a Policy
`curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/docs/iam-policy-example.json`
```sh
aws iam create-policy \
--policy-name AmazonEKS_EFS_CSI_Driver_Policy \
--policy-document file://iam-policy-example.json
```
#### Create a ServiceAccount
Replace account-id accordingly in the command below.
```sh
eksctl create iamserviceaccount \
--cluster my-demo-cluster \
--namespace kube-system \
--name efs-csi-controller-sa \
--attach-policy-arn arn:aws:iam::account-id:policy/AmazonEKS_EFS_CSI_Driver_Policy \
--approve \
--region eu-central-1
```
#### Install the EFS CSI Driver
```sh
helm repo add aws-efs-csi-driver https://kubernetes-sigs.github.io/aws-efs-csi-driver/
helm repo update
```
```sh
helm upgrade -i aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver \
--namespace kube-system \
--set image.repository=602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/aws-efs-csi-driver \
--set controller.serviceAccount.create=false \
--set controller.serviceAccount.name=efs-csi-controller-sa
```
#### Creating the EFS, SG and mount points
For simplicity, manually copy the subnet IDs of the VPC.
`./complete_efs_setup.sh private_subnet1_id private_subnet2_id`
### Kubernetes StorageClass and PVC
Replace the Filesystem ID in [kubernetes_storage/efs-storageclass.yaml](https://github.com/RonaldGalea/my-eks-issue/blob/main/issue_preparation/kubernetes_storage/efs-storageclass.yaml):
`kubectl apply -f kubernetes_storage/efs-storageclass.yaml`
`kubectl apply -f kubernetes_storage/efs-pvc.yaml`
### Deploy pods
`kubectl apply -f debugging_pods/pod1.yaml`
After the first one is running:
`kubectl apply -f debugging_pods/pod2.yaml`
### Exec in pods
`kubectl exec --stdin --tty pod1 -- /bin/bash`
`kubectl exec --stdin --tty pod2 -- /bin/bash`
### Relevant system information
Output of `aws --version`
```
aws-cli/2.8.7 Python/3.9.11 Linux/5.15.0-58-generic exe/x86_64.ubuntu.22 prompt/off
```
Output of `eksctl version`
```
0.125.0
```
Output of `helm version`
```
version.BuildInfo{Version:"v3.10.3", GitCommit:"835b7334cfe2e5e27870ab3ed4135f136eecc704", GitTreeState:"clean", GoVersion:"go1.18.9"}
```
### Thank you
I'd be very thankful for any hint/pointer as to where the issue may lie. Thank you in advance.
I have a redis cluster after using that in my node.js backend code it is giving 'ENOTFOUND' error, my backend deployed in ECS docker.
can anyone please help me on why i am getting this error?
Hi, everyone!
I`m want to run my app on cloud (thinking about AWS) but don`t have any DevOps experience to manage the whole thing by myself.
During my research I`ve stumbled upon AWS App Runner which seems can do the trick for me, but maybe there are other tools and clouds to do so.
I want to focus on the app rather than cloud management, I use docker for the app, but could theoretically switch to the Kubernetes as it seems to be way more popular.
Would appreciate your thoughts and suggestions, maybe some drawbacks of AWS App Runner I should know before start using it?