Questions tagged with Containers
Content language: English
Sort by most recent
The aws ecs fargate is being deployed through the aws cli using the console for only the task definition and the rest of the cluster, service container, and deployment. One day, I saw that the task definitions were created as stacks in cloudformation. (Failure records were also included.) Searching or looking at the official documentation says that the stack is not created in cloudformation. What is the cause? And how to prevent it from spawning? I created it by referring to the following document. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-blue-green.html
Hi :) I am currently trying to resolve an issue with our ECS EC2-based cluster. Our Task definitions utilize the container-level soft CPU limit (set to 50 CPU units), but dont utilize the hard CPU limit from task-definition level. In our case we have more than 2000 services running, each one with a single task. These tasks have new revisions being re-deployed at a very high rate and at the same time, and upon startup they often reach levels of CPU usage above 1000%. This causes the whole EC2 instances to become unresponsive, resulting in a need to restart the whole machine. This we have solved with an alarm and a lambda for quickly rebooting failing, unresponsive instances. But this is not a solution, only a temporary fix. What we would like to achieve is to somehow limit the CPU usage of each task to not exceed the soft limit by factors of 10 or more. I have found a way to do it using the hard CPU limit, but this solution is also not great, mainly for the following reasons:* 1. Our tasks even with the soft limit use at most 50% of this reservation, but the hard limit's minimum value for ECS with EC2 is 128 units (compared to current 50) 2. The hard limit automatically increases the reservation value for the chosen task, meaning that setting the limit to 128 for all 2000+ servcies/tasks would require us to host more than double the amount of EC2 machines, without an actual gain, as our cluster's usage right now hangs around 5-10%. So my question is, is there a way to somehow limit the maximum cpu usage of each task/container without using the task-level hard limit? Our EC2 machines are running on Ubuntu
I want to read data from Databricks output and format the data for SageMaker training
I am trying to cut down the cost of container insights, so I want to delete some metrics, that I am not using at any time. Please let me know if there is any way to delete default metrics.
Are there any native options similar to AWS backup to create backups of an EKS clsuter?
Hi All, I'm having an issue running enhanced scanning in ECR for my Docker image. To replicate the issue, I have tested this on some sample base images that I'm using from Nvidia's container registry. When uploading the base Nvidia TensorRT image for Cuda 11.6, I am able to receive a vulnerability report. This is the tag: `nvcr.io/nvidia/tensorrt:21.07-py3` However, a newer CUDA version variant (which is still Ubuntu 20 based) is showing `UNSUPPORTED_IMAGE` in the vulnerability report: `nvcr.io/nvidia/tensorrt:22.12-py3` According to AWS docs, Ubuntu 20 images should still be supported. Is there any way to remediate this?
For EC2 there are clear explanations about network bandwidth for different instances. What about ECS Fargate? I managed to find only that article with some benchmark - https://www.stormforge.io/blog/aws-fargate-network-performance/ so far. What is the guaranteed and maximum network bandwidth for Fargate tasks? Does it depend on number of vCPU/Memory?
Is it possible to extend an EKS cluster (on EC2) with on-prem nodes? The on-prem nodes would ideally be connected securely to the VPC to avoid going over public internet. The motivation behind this is to utilize existing servers on-prem for some of the workload, and during peak hours extend the capabilities of the cluster via autoscaling EKS on-demand. Ideally everything would be centrally managed under AWS, therefore some EKS nodes would always be active for the control plane, data redundancy, etc. In researching this topic so far I've only found resources on EKS via AWS Outposts, EKS Anywhere, joining federated clusters, etc. -- but it seems these solutions involve managing our own infrastructure, losing the benefits of fully-managed EKS on AWS. I can't find any information about extending AWS-managed EKS clusters with on-prem hardware (effectively allowing AWS to take ownership of the node/system and integrate it into the cluster). Has anyone accomplished this, or is not viable/supported? I appreciate any feedback, thanks!
Hi. Is it possible to set up routing rules for pods in EKS using standard mesh plugins? I’m not able to install plugins like Calico.
I know how to release a host. I am not trying to release a host. I am trying to DELETE a host. Why can't I DELETE a host? Are they designed to just stack up and accumulate ad nauseam or is there a way to get rid of them? https://us-west-2.console.aws.amazon.com/ec2/home?region=us-west-2#Hosts:
i read that i can remote debug an application in a docker container by starting the container like ``` docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it <image_name> ``` however i don't think i can run a docker component with the -it 'interactive flag' without the -it flag if i try to connect to a running process in the docker i receive a ``` Unable to start debugging. Attaching to process 29966 with GDB failed because of insufficient privileges with error message 'ptrace: Operation not permitted.'. ``` error. how does anyone else debug inside a greengrass container ?
I am new to Lightsail and I'm trying to debug a failed deployment, and I'm at the point of shooting in the dark. Any ideas would be appreciated! I have two images: a Flask/Gunicorn image built from Python-alpine and an Nginx image. Locally, I can spin them up with `docker-compose` and they work beautifully. But in Lightsail, all I know is that my Flask image "took too long": ``` [17/Mar/2023:24:11:33] [deployment:14] Creating your deployment [17/Mar/2023:24:13:05] [deployment:14] Started 1 new node [17/Mar/2023:24:14:39] [deployment:14] Started 1 new node [17/Mar/2023:24:15:54] [deployment:14] Started 1 new node [17/Mar/2023:24:16:14] [deployment:14] Took too long ``` Things I've tried that haven't worked: From https://repost.aws/questions/QUrqo_fzNTQ5i1E08tT1uM7g/lightsail-container-took-too-long-to-deploy-all-of-a-sudden-nothing-in-logs: - Set Gunicorn's logging to DEBUG. Sometimes I can see the Gunicorn process being killed by SIGTERM, but the "too long" part above has no additional information. - Set Health Check to 300 seconds in case that was the source of the SIGTERM. No effect. - Increase capacity from "nano" to "micro" to "small". No effect. From https://repost.aws/questions/QU8i3bF2BZQZiwKfxGw5CfgQ/how-to-deploy-amazon-linux-on-a-lightsail-container-service - Made sure I pasted my launch command into the appropriate "launch command" form input. No effect. Perhaps I missed something obvious. **Update**: I have Nginx configured to proxy requests to gunicorn and to serve static content. Below are Dockerfiles and docker-compose: Flask/Gunicorn Dockerfile: ``` FROM python:3.10-alpine ENV POETRY_VERSION=1.2.2 \ POETRY_VIRTUALENVS_IN_PROJECT=true \ PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 RUN apk add --no-cache curl \ && curl -sSL https://install.python-poetry.org | POETRY_VERSION=$POETRY_VERSION python3 - WORKDIR /src # TODO: build wheel for pipeline COPY . . RUN /root/.local/bin/poetry install --only main CMD . /src/.venv/bin/activate && gunicorn -w 2 --log-level debug --bind=0.0.0.0:8080 'app:app' ``` Nginx Dockerfile: ``` FROM nginx:alpine COPY ./nginx.conf /etc/nginx/nginx.conf ``` docker-compose.yml ``` version: "3.3" services: web: image: myDockerHub/myImage restart: always volumes: - static_volume:/src/my_project/static ports: - "8080:80" nginx: image: myDockerHub/nginx restart: always volumes: - static_volume:/src/my_project/static depends_on: - web ports: - "80:80" volumes: static_volume: ```