By using AWS re:Post, you agree to the Terms of Use
/Containers/

Questions tagged with Containers

Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

ECS Capacity Provider Auto-Scaler Instance Selection

Hello, I am working with AWS ECS capacity providers to scale out instances for jobs we run. Those jobs have a large variation in the amount of memory that is needed per ECS task. Those memory needs are set at the task and container level. We have a capacity provider that is connected to an EC2 auto scaling group (asg). The asg has the instance selection so that we specify instance attributes. Here we gave it a large range for memory and cpu, and it shows hundreds of possible instances. When we run a small job (1GB of memory) it scales up a `m5.large` and `m6i.large` instance and the job runs. This is great because our task runs but the instance it selected is much larger than our needs. We then let the asg scale back down to 0. We then run a large job (16GB) and it begins scaling up. But it starts the same instance types as before. The instance types have 8GB of memory when our task needs double that on a single instance. In the case of the small job I would have expected the capacity provider to scale up only 1 instance that was closer in size to the memory needs to the job (1GB). And for the larger job I would have expected the capacity provider to scale up only 1 instance that had more than 16GB of memory to accommodate the job (16GB). Questions: * Is there a way to get capacity providers and autoscaling groups to be more responsive to the resource needs of the pending tasks? * Are there any configs I might have wrong? * Am I understanding something incorrectly? Are there any resources you would point me towards? * Is there a better approach to accomplish what I want with ECS? * Is the behavior I outlined actually to be expected? Thank you
1
answers
0
votes
19
views
asked 19 days ago

ECS services not scaling in (scale in protection is disabled)

Hello. I've an ECS cluster (EC2 based) attached to a CSP. The service scaling out is OK, but it isn't scaling IN. And I've already checked the scale in protection and it's disabled (Disable Scale In: false) Description of the environment: - 1 cluster (ec2-based), 2 services - Services are attached to an ALB (registering and deregistering fine) - Services are with autoscaling enabled, checking memory (above 90%), NO scale in protection,1 task minimum, 3 tasks max. - Services are using a Capacity Service provider, apparently working as intended: it's creating new EC2 instances when new tasks are provisioned and dropping when they're with 0 tasks running, registering and deregistering as expected. - The cloudwatch alarms are working fine, Alarming when expected (with Low and High usages) Description of the test and what's "not working": - Started with 1 task for each service and 1 instance for both services. - I've managed to enter one of the containers and run a memory test, increasing its usage to over 90% - The service detected it and asked for the provision of a new task. - There were no instances that could allocate the new task, so the ECS asked for the CSP/Auto Scaling Group a new ec2 instance - The new instance was provisioned, registered in the cluster and ran the new task. - The service's memory usage avg. decreased from ~93% to ~73% (average from the sum of both tasks) - All's fine, the memory stress ran for 20 minutes. - After the memory stress was over, the memory usage dropped to ~62% - The cloudwatch alarm was triggered (maybe even before, when it was with 73% usage, I didn't check it) - The service is still running 2 tasks right now (after 3 hours or more) and it's not decreasing the Desired Count from 2 to 1. Is there anything that I'm missing here? I've already done a couple of tests, trying to change the service auto scaling thresholds and other configurations, but nothing is changing this behaviour. Any help would be appreciated. Thanks in advance.
1
answers
0
votes
25
views
asked 20 days ago

Accessing Custom Environment Variables inside component's docker containers

I have a manifest file which looks like : ``` { "Platform": { "os": "all" }, "Lifecycle": { "Setenv": { "ENDPOINT": "Test_endpoint" }, "Run": "docker rm core -f && docker rm A -f && docker rm B -f && docker rm C -f && docker-compose -f {artifacts:path}/docker-compose.yml up -d" }, "Artifacts": [ { "URI": "docker:D" }, { "URI": "s3://bucket/docker-compose.yml" }, { "URI": "docker:C" }, { "URI": "docker:B" }, { "URI": "docker:A" } ] } ``` and in docker compose file ``` service: image: "XXXXX.dkr.ecr.us-east-1.amazonaws.com/service-1.0:latest" container_name: service network_mode: host environment: AWS_GG_NUCLEUS_DOMAIN_SOCKET_FILEPATH_FOR_COMPONENT: ${AWS_GG_NUCLEUS_DOMAIN_SOCKET_FILEPATH_FOR_COMPONENT} SVCUID: ${SVCUID} AWS_CONTAINER_CREDENTIALS_FULL_URI: ${AWS_CONTAINER_CREDENTIALS_FULL_URI} AWS_CONTAINER_AUTHORIZATION_TOKEN: ${AWS_CONTAINER_AUTHORIZATION_TOKEN} AWS_REGION: ${AWS_REGION} AWS_IOT_THING_NAME: ${AWS_IOT_THING_NAME} ENDPOINT: ${ENDPOINT} depends_on: - core - ledservice - scannerservice volumes: - ${AWS_GG_NUCLEUS_DOMAIN_SOCKET_FILEPATH_FOR_COMPONENT}:${AWS_GG_NUCLEUS_DOMAIN_SOCKET_FILEPATH_FOR_COMPONENT} command: --uri localhost:4400 --port 9100 --name stow ``` But i am unable to retrieve the value of ENDPOINT in my docker container using System.getEnv("ENDPOINT") or print all the environment variable using printenv on SSH. Value being returned is 'ENDPOINT=' i.e. empty. What am i doing wrong here? because i could not find much references where Setenv is being used or how to use it using docker.
1
answers
0
votes
58
views
asked 24 days ago

Should we combine or split application and Infrastructure code for microservice projects deployed in EKS.

We are designing the architecture of a microservice solution where, 1. We will have tens of container-based microservices deployed in the EKS cluster. 2. Our microservices are java spring boot applications 3. There will be base infrastructure components such as VPC, Subnets, CloudFront, EKS, ALB, etc. 4. There will be infrastructure components shared across multiple microservices such as RDS, EFS, API Gateway, etc. 5. Each microservice could have its own dedicated cloud infrastructure (DynamoDB, SQS, etc.). 6. We will be using AWS CDK for deploying all the infrastructure components (3,4 and 5). 7. In addition to infrastructure components we will have Kubernetes configurations such as helm, configmap, policy as code (PoC) components related to OPA policies, Monitoring As code components etc. associated with each microservice. We are trying to figure out an appropriate Git repository structure here. A few things we already decided to do are the following. * Have CDK code for the base infra (3) and shared infra (4) in a single GitRepo and manage the deployment of it through some infra pipeline separately. * Each springboot microservice will have its own application Git Repositories which will be tied up with its own codepipeline CI process that builds the container and registers in ECR upon commits at the main branch (trunk-based model). The confusion we have here is where to store the infrastructure code that is associated with each microservice (items 5 and 7 in the above list). We thought of two options here and we find merits and demerits to both the options. **Option#1** : Keep the infra codes ( 5 & 7) related to the microservices in the application repository associated with the microservice. **Pros** * Easy to sync application and infra changes together as it's in a single repo. Infra and app changes can be deployed together and roll back together as well. **Cons** * Since app and infra codes are stored in the same repo, any infra change could trigger the app build pipeline. And similarly, an app code change could trigger automated infra deployment pipelines (if we have an automated CDK deployment pipeline). Considering app build is part of the CI process and infra deployment falls into the CD part, it would be tough to manage the CI/CD pipeline efficiently. * Typically CI jobs would initiate code scans, code coverage checks, container scans etc. we wouldn't want to trigger those CI stages for simple changes in the infra segment of the code such as a change in configuration item or change in monitoring as a code component etc. * In the standard CI/CD pipeline, after the docker container is built and registered in ECR repo, the container tag will be updated in the helm configuration (which is now stored in the app repository), this could trigger the CI flow again. * In CD pipeline we typically execute automated E2E test, API test etc. One wouldn't want them to be triggered for simple infra changes alone. **Option#2** : Each microservice will have two repositories. One for keeping the application code, and another one for keeping all the infrastructure code (5 & 7) associated with it. **Pros** * Isolation between infra and application codes enables us to segregate CI and CD processes without conflict. **Cons** * It would be challenging to sync application and infra changes as they are now in two different repositories. **Our question is** , * Looking at various articles, there are multiple ways to do it, but we would request advice from AWS community what is the best repo design that suits our unique case explained above. * Do you have any specific advice on how infra associated with a microservice (CDK) and app container deployment in EKS can be integrated in a single deployment pipeline Reference materials used https://devops.stackexchange.com/questions/12803/best-practices-for-app-and-infrastructure-code-repositories https://www.youtube.com/watch?v=MeU5_k9ssrs https://www.youtube.com/watch?v=f5EpcWp0THw
1
answers
1
votes
45
views
asked a month ago

Unable to connect to AWS service[API Gateway] from IOT Core device[Inside docker container]

I have created a component using GreengrassV2 and running multiple containers inside this component. Now, my requirement is to call API Gateway and fetch some data into one of the docker containers running on the local device inside the component. I am using "import com.amazonaws.auth.DefaultAWSCredentialsProviderChain;" for fetching the credentials but getting ``` 05:45:06.476 INFO - Calling API Gateway with request params com.amazon.spiderIoT.stowWorkcell.entities.ApiGatewayRequest@e958e637 Exception in thread "main" com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: To use assume role profiles the aws-java-sdk-sts module must be on the class path., com.amazonaws.auth.profile.ProfileCredentialsProvider@7c36db44: profile file cannot be null, com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@6c008c24: Failed to connect to service endpoint: ] at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:136) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) at com.amazon.spiderIoT.stowWorkcell.client.ApiGatewayClient.execute(ApiGatewayClient.java:134) at com.amazon.spiderIoT.stowWorkcell.client.ApiGatewayClient.execute(ApiGatewayClient.java:79) at com.amazon.spiderIoT.stowWorkcell.StowWorkcellService.startMQTTSubscriberForSortLocationEvents(StowWorkcellService.java:120) at com.amazon.spiderIoT.stowWorkcell.StowWorkcellService.onInitialize(StowWorkcellService.java:65) at com.amazon.spideriot.sdk.service.Service.run(Service.java:79) at com.amazon.spiderIoT.stowWorkcell.StowWorkcellServiceDriver.main(StowWorkcellServiceDriver.java:26) ``` My dependencyConfiguration looks like :- ``` { "aws.greengrass.TokenExchangeService": { "componentVersion": "2.0.3", "DependencyType": "HARD" }, "aws.greengrass.DockerApplicationManager": { "componentVersion": "2.0.4" }, "aws.greengrass.Cloudwatch": { "componentVersion": "3.0.0" }, "aws.greengrass.Nucleus": { "componentVersion": "2.4.0", "configurationUpdate": { "merge": "{\"logging\":{\"level\":\"INFO\"}, \"iotRoleAlias\": \"GreengrassV2TestCoreTokenExchangeRoleAlias\"}" } }, "aws.greengrass.LogManager": { "componentVersion": "2.2.3", "configurationUpdate": { "merge": "{\"logsUploaderConfiguration\":{\"systemLogsConfiguration\": {\"uploadToCloudWatch\": \"true\",\"minimumLogLevel\": \"INFO\",\"diskSpaceLimit\": \"10\",\"diskSpaceLimitUnit\": \"MB\",\"deleteLogFileAfterCloudUpload\": \"false\"},\"componentLogsConfigurationMap\": {\"LedService\": {\"minimumLogLevel\": \"INFO\",\"diskSpaceLimit\": \"20\",\"diskSpaceLimitUnit\": \"MB\",\"deleteLogFileAfterCloudUpload\": \"false\"}}},\"periodicUploadIntervalSec\": \"5\"}" } } } ``` Java code for using AWS credentials ``` public class APIGatewayModule extends AbstractModule { @Provides @Singleton public AWSCredentialsProvider getAWSCredentialProvider() { return new DefaultAWSCredentialsProviderChain(); } @Provides @Singleton public ApiGatewayClient getApiGatewayClient(final AWSCredentialsProvider awsCredentialsProvider) { System.out.println("Getting client configurations"); final com.amazonaws.ClientConfiguration clientConfiguration = new com.amazonaws.ClientConfiguration(); System.out.println("Got client configurations" + clientConfiguration); return new ApiGatewayClient(clientConfiguration, Region.getRegion(Regions.fromName("us-east-1")), awsCredentialsProvider, AmazonHttpClient.builder().clientConfiguration(clientConfiguration).build()); } } ``` I have been following this doc: https://docs.aws.amazon.com/greengrass/v2/developerguide/device-service-role.html My question is regarding everywhere in this document, it is mentioned that "AWS IoT Core credentials provider", what credentials provider should we use? Also, as mentioned in this doc we should use --provision true when "When you run the AWS IoT Greengrass Core software, you can choose to provision the AWS resources that the core device requires." But we started without this flag, how can this be tackled and is there any other document that provides reference to using credentials provider and calling API Gateway from AWS SDK Java. On SSH to docker, i could find that the variable is set ``` AWS_CONTAINER_CREDENTIALS_FULL_URI=http://localhost:38135/2016-11-01/credentialprovider/ ``` But unable to curl from docker to this URL, is this how this is suppose to work?
3
answers
0
votes
47
views
asked a month ago

How to create dynamic dataframe from AWS Glue catalog in local environment?

I I have performed some AWS Glue version 3.0 jobs testing using Docker containers as detailed [here](https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/). The following code outputs two lists, one per connection, with the names of the tables in a database: ```python import boto3 db_name_s3 = "s3_connection_db" db_name_mysql = "glue_catalog_mysql_connection_db" def retrieve_tables(database_name): session = boto3.session.Session() glue_client = session.client("glue") response_get_tables = glue_client.get_tables(DatabaseName=database_name) return response_get_tables s3_tables_list = [table_dict["Name"] for table_dict in retrieve_tables(db_name_s3)["TableList"]] mysql_tables_list = [table_dict["Name"] for table_dict in retrieve_tables(db_name_mysql)["TableList"]] print(f"These are the tables from {db_name_s3} db: {s3_tables_list}\n") print(f"These are the tables from {db_name_mysql} db {mysql_tables_list}") ``` Now, I try to create a dynamic dataframe with the *from_catalog* method in this way: ```python import sys from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame source_activities = glueContext.create_dynamic_frame.from_catalog( database = db_name, table_name =table_name ) ``` When `database="s3_connection_db"`, everything works fine, however, when I set `database="glue_catalog_mysql_connection_db"`, I get the following error: ```python Py4JJavaError: An error occurred while calling o45.getDynamicFrame. : java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver ``` I understand the issue is related to the fact that I am trying to fetch data from a mysql table but I am not sure how to solve this. By the way, the job runs fine on the Glue console. I would really appreciate some help, thanks!
0
answers
0
votes
23
views
asked a month ago

Lambda function as image, how to find your handler URI

Hello, I have followed all of the tutorials on how to build an AWS Lambda function as a container image. I am also using the AWS SAM SDK as well. What I don't understand is how do I figure out my end-point URL mapping from within my image to the Lambda function? For example in my docker image that I am using the AWS Python 3.9 image where I install some other packages and my python requirements and my handler is defined as: summarizer_function_lambda.postHandler My python file being copied into the image is the same name as above but without the .postHandler My AWS SAM Template has: AWSTemplateFormatVersion: "2010-09-09" Transform: AWS::Serverless-2016-10-31 Description: AWS Lambda dist-bart-summarizer function # More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst Globals: Function: Timeout: 3 Resources: DistBartSum: Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction Properties: FunctionName: DistBartSum ImageUri: <my-image-url> PackageType: Image Events: SummarizerFunction: Type: Api # More info about API Event Source: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#api Properties: Path: /postHandler Method: POST So what is my actual URI path to do my POST call either locally or once deployed on Lambda?? When I try and do a CURL command I get an "{"message": "Internal server error"}% " curl -XPOST "https://<my-aws-uri>/Prod/postHandler/" -d '{"content": "Test data.\r\n"}' So I guess my question is how do you "map" your handler definitions from within a container all the way to the end point URI?
2
answers
0
votes
47
views
asked 2 months ago

Container cannot bind to port 80 running as non-root user on ECS Fargate

I have an image that binds to port 80 as a **non-root** user. I can run it locally (macOS Monterey, Docker Desktop 4.7.1) absolutely fine. When I try and run it as part of an ECS service on Fargate it fails as so: **Failed to bind to 0.0.0.0/0.0.0.0:80** **caused by SocketException: Permission denied** Fargate means I have to run the task in network mode `awsvpc` - not sure if that's related? Any views on what I'm doing wrong? The [best practices document](https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/bestpracticesguide.pdf) suggests that I should be running as non-root (p.83) and that under awsvpc it's reasonable to expose port 80 (diagram on p.23). FWIW here's a mildly cut down version the JSON from my task definition: ``` { "taskDefinitionArn": "arn:aws:ecs:us-east-1:<ID>:task-definition/mything:2", "containerDefinitions": [ { "name": "mything", "image": "mything:latest", "cpu": 0, "memory": 1024, "portMappings": [ { "containerPort": 80, "hostPort": 80, "protocol": "tcp" } ], "essential": true, "environment": [] } ], "family": "mything", "executionRoleArn": "arn:aws:iam::<ID>:role/ecsTaskExecutionRole", "networkMode": "awsvpc", "revision": 2, "volumes": [], "status": "ACTIVE", "requiresAttributes": [ { "name": "com.amazonaws.ecs.capability.logging-driver.awslogs" }, { "name": "ecs.capability.execution-role-awslogs" }, { "name": "com.amazonaws.ecs.capability.ecr-auth" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19" }, { "name": "ecs.capability.execution-role-ecr-pull" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18" }, { "name": "ecs.capability.task-eni" } ], "placementConstraints": [], "compatibilities": [ "EC2", "FARGATE" ], "runtimePlatform": { "operatingSystemFamily": "LINUX" }, "requiresCompatibilities": [ "FARGATE" ], "cpu": "256", "memory": "1024", "tags": [] } ```
2
answers
1
votes
355
views
asked 2 months ago

High-Traffic, Load-Balanced Wordpress Site - Optimal DevOps setup for deployment?

TLDR: I inherited a Wordpress site that I now manage that had a DevOps deployment pipeline that worked when the site was low to medium traffic, but now the site consistently gets high-traffic and I'm trying to improve the deployment pipeline. The site I inherited uses Lightsail instances and a Lightsail load balancer in conjunction with one RDS database instance and an S3 bucket for hosted media. When I inherited the site, the deployment pipeline from the old developer was: *Scale site down to one instance, make changes to that one instance, once changes are complete, clone that updated instance as many times as you need* This worked fine when the site mostly ran with only one instance except during peak traffic times. However, now at all times we have 3-5 instances as even our "off-peak" traffic is really high requiring multiple instances. I'd like to improve the deployment pipeline to allow for deployment during peak-traffic times without issues. I'm worried about updating multiples instances behind the load balancer one by one sequentially because we have Session Persistence disabled to allow for more evenly distributed load balancing. And I'm worried a user hopping to different instances that have a different functions.php file will cause issues. Should I just enable session persistence when I want to make updates and sequentially updates instances behind the load balancer one by one? Or is there a better suited solution? Should I move to a containers setup? I'm admittedly a novice with AWS so any help is greatly appreciated. Really just looking for general advice and am confident I can figure out how to implement a suggested best-practice solution. Thanks!
1
answers
0
votes
18
views
asked 3 months ago

ECS Exec error TargetNotConnectedException

I'm getting this error when I try to connect to my Fargate ECS container using ECS Exec feature: ``` An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later. ``` The task's role was missing the SSM permissions, I added them, but the error persists. Below is the output of the [amazon-ecs-exec-checker](https://github.com/aws-containers/amazon-ecs-exec-checker) tool: ``` Prerequisites for check-ecs-exec.sh v0.7 ------------------------------------------------------------- jq | OK (/usr/bin/jq) AWS CLI | OK (/usr/local/bin/aws) ------------------------------------------------------------- Prerequisites for the AWS CLI to use ECS Exec ------------------------------------------------------------- AWS CLI Version | OK (aws-cli/2.2.1 Python/3.8.8 Linux/5.13.0-39-generic exe/x86_64.ubuntu.20 prompt/off) Session Manager Plugin | OK (1.2.295.0) ------------------------------------------------------------- Checks on ECS task and other resources ------------------------------------------------------------- Region : us-west-2 Cluster: removed Task : removed ------------------------------------------------------------- Cluster Configuration | Audit Logging Not Configured Can I ExecuteCommand? | arn:aws:iam::removed ecs:ExecuteCommand: allowed ssm:StartSession denied?: allowed Task Status | RUNNING Launch Type | Fargate Platform Version | 1.4.0 Exec Enabled for Task | OK Container-Level Checks | ---------- Managed Agent Status ---------- 1. RUNNING for "api-staging" ---------- Init Process Enabled (api-staging:14) ---------- 1. Enabled - "api-staging" ---------- Read-Only Root Filesystem (api-staging:14) ---------- 1. Disabled - "api-staging" Task Role Permissions | arn:aws:iam::removed ssmmessages:CreateControlChannel: allowed ssmmessages:CreateDataChannel: allowed ssmmessages:OpenControlChannel: allowed ssmmessages:OpenDataChannel: allowed VPC Endpoints | Found existing endpoints for vpc-removed: - com.amazonaws.us-west-2.s3 SSM PrivateLink "com.amazonaws.us-west-2.ssmmessages" not found. You must ensure your task has proper outbound internet connectivity. Environment Variables | (api-staging:14) 1. container "api-staging" - AWS_ACCESS_KEY: not defined - AWS_ACCESS_KEY_ID: defined - AWS_SECRET_ACCESS_KEY: defined ``` The output has only green and yellow markers, no red ones, which means Exec should work. But it doesn't. Any ideas why?
1
answers
0
votes
103
views
asked 3 months ago

ECR delete image with terraform kreuzwerker/docker provider gets 405 Method Not Allowed. Worked until yesterday with no changes.

I have had multiple builds set up in AWS CodeBuild that run terraform code. I am using Terraform version 1.0.11 with kreuzwerker/docker provider 2.16 and aws provider version 4.5.0. Yesterday, builds stopped working because when docker_image_registry deletes the old image I receive `Error: Got error getting registry image digest: Got bad response from registry: 405 Method Not Allowed`. I have not changed any code, I'm using the same `aws/codebuild/standard:4.0` build image. Note that I have another API in a different region (`us-west-1`) with the exact same code, and it still works. Here should be enough code to figure out what's going on: ``` locals { ecr_address = format("%v.dkr.ecr.%v.amazonaws.com", data.aws_caller_identity.current.account_id, var.region) environment = terraform.workspace name = "${local.environment}-${var.service}" os_check = data.external.os.result.os == "Windows" ? "Windows" : "Unix" } variable "region" { default = "us-east-2" } provider "aws" { region = var.region } provider "docker" { host = local.os_check == "Windows" ? "npipe:////.//pipe//docker_engine" : null registry_auth { address = local.ecr_address username = data.aws_ecr_authorization_token.token.user_name password = data.aws_ecr_authorization_token.token.password } } data "external" "git_hash" { program = local.os_check == "Windows" ? ["Powershell.exe", "./Scripts/get_sha.ps1"] : ["bash", "./Scripts/get_sha.sh"] } data "aws_caller_identity" "current" {} data "aws_ecr_authorization_token" "token" { registry_id = data.aws_caller_identity.current.id } resource "aws_ecr_repository" "repo" { name = lower(local.name) image_tag_mutability = "MUTABLE" image_scanning_configuration { scan_on_push = true } tags = merge(local.common_tags, tomap({ "Name" = local.name })) } resource "aws_ecr_lifecycle_policy" "policy" { repository = aws_ecr_repository.repo.name policy = <<EOF { "rules": [ { "rulePriority": 1, "description": "Keep only last 10 images, expire all others", "selection": { "tagStatus": "any", "countType": "imageCountMoreThan", "countNumber": 10 }, "action": { "type": "expire" } } ] } EOF } resource "docker_registry_image" "image" { name = format("%v:%v", aws_ecr_repository.repo.repository_url, data.external.git_hash.result.sha) build { context = replace(trimsuffix("${path.cwd}", "/Terraform"), "/${var.company}.${var.service}", "") dockerfile = "./${var.company}.${var.service}/Dockerfile" } lifecycle { create_before_destroy = true } } ```
0
answers
0
votes
0
views
asked 4 months ago

Running TagUI RPA as a Lambda Function

I am trying to run a simple TagUI flow as a Lambda function using container images. I have made a Dockerfile using the bootstrap and function.sh from [this tutorial](https://aripalo.com/blog/2020/aws-lambda-container-image-support/): ``` FROM amazon/aws-lambda-provided:al2 RUN yum install -y wget nano php java-1.8.0-openjdk unzip procps RUN curl https://intoli.com/install-google-chrome.sh | bash RUN wget https://github.com/kelaberetiv/TagUI/releases/download/v6.46.0/TagUI_Linux.zip \ && unzip TagUI_Linux.zip \ && rm TagUI_Linux.zip \ && ln -sf /var/task/tagui/src/tagui /usr/local/bin/tagui \ && tagui update RUN sed -i 's/no_sandbox_switch=""/no_sandbox_switch="--no-sandbox"/' /var/task/tagui/src/tagui ADD tr.tag /var/task/tagui/src/tr.tag WORKDIR /var/runtime/ COPY bootstrap bootstrap RUN chmod 755 bootstrap WORKDIR /var/task/ COPY function.sh function.sh RUN chmod 755 function.sh CMD [ "function.sh.handler" ] ``` My function.sh: ``` function handler () { cp -r /var/task/tagui/src/* /tmp; chmod 755 /tmp/tagui; OUTPUT=$(/tmp/tagui /tmp/tr.tag -h); echo "${OUTPUT}"; } ``` Notes: - the sed line is required to get TagUI running in docker images. - tr.tag is just a simple flow to do a password reset on a webapp so I can confirm the container has run. - everything has to be run in /tmp as that is the only folder Lambda can write to in the container and TagUI creates a load of temporary files during execution. When I run as a Lambda I get the error: ```./tmp/tagui/src/tagui: line 398: 56 Trace/breakpoint trap (core dumped) $chrome_command --user-data-dir="$TAGUI_DIR/chrome/tagui_user_profile" $chrome_switches $window_size $headless_switch $no_sandbox_switch > /dev/null 2>&1``` When I run the container from Docker it runs perfectly. I have tried increasing both the memory and timeout of the function. The end goal I am trying to achieve is to have a Lambda function triggered by an API gateway that can receive a TagUI RPA flow and run it.
1
answers
0
votes
17
views
asked 4 months ago

Overriding Hostname on ECS Fargate

Hello, I am setting up a Yellowfin [deployment](https://wiki.yellowfinbi.com/display/yfcurrent/Install+in+a+Container) using their stock app-only [image](https://hub.docker.com/r/yellowfinbi/yellowfin-app-only) on ECS Fargate. I was able to set up a test cluster for my team to experiment with. Yellowfin requires a license to use their software. To issue a license, Yellowfin needs to know the hostname of the underlying platform it runs on. Yellowfin can provide wildcard licenses that match on a standard prefix or suffix. Currently, we are using a development license that matches on the default hostname that our test environment's Fargate task is assigned. The default hostname seems to be of the form <32-character alphanumeric string>-<10-digit number> where the former is the running task's ID and the latter is an ID of some other related AWS resource (the task definition? the cluster?) that I could not identify. Although this 10-digit number stays constant if new tasks are run, it does not seem like a good strategy to base the real Yellowfin license off of it. I would like to override the hostname of a Fargate task when launching the container to include a common prefix (e.g., "myorg-yfbi-") to make it simple to request a wildcard license for actual use. If possible, I would like to avoid building my own image or migrating to use another AWS service. Is there a standard way to override the hostname for a Fargate service by solely updating the task definition? Would overriding the entrypoint be a viable option? Is there another way to set hostname in Fargate that I am not aware of? Thank you for any guidance you can provide. Happy to provide more information if it helps.
1
answers
0
votes
138
views
asked 5 months ago

Modify Auto Scaling Health Check setting for Elastic Beanstalk Enviornment

Hi I have been go through the documentation for update the "`AWS:AutoScaling:AutoSclaingGroup`" in my existing EB Environment. My current environment is setup using `Docker-compose.yml `file to refer to the default .env file. I decided to go with update environment setting through AWS CLI. But i keep getting the error as below. ``` aws elasticbeanstalk update-environment --environment-name Testenv-env --template-name v1 An error occurred (ConfigurationValidationException) when calling the UpdateEnvironment operation: Configuration validation exception: Invalid option specification (Namespace: 'aws:autoscaling:autoscalinggroup', OptionName: 'HealthCheckType'): Unknown configuration setting. ``` In my v1.json file, I have the setup in below ``` aws:autoscaling:autoscalinggroup: HealthCheckType: ELB ``` Then I researched online a bit, found ppl are discussing this is no longer working, and I checked the latest API documentation, which is conflicting from the instruction in below link. Apparent "HealthCheckType" is no longer in the Support field for EB. Then I turned to another solution mentioned in below link: [https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html) It suggested that I use the autoscaling.config in the .ebextensions folder. However I found quite not clear instruction to how to use the .ebextensions folder with my current environment setup where I use docer-compose to refer to the default .env file when deploying. Can someone help in pointing to me a clear instruction. Another thought is, since the environment already exist, if I directly go to the ASG group in my environment, and update from there, would new deployment (with upgrade version) into my EB environment override this ASG change each time I do upgrade? Or the change will be honored and kept in the ASG group as long as it was not deleted.
1
answers
0
votes
32
views
asked 5 months ago

Multi-model, Multi-container and Variants - what are the possible combinations?

This question is mostly for educational purposes, but the current SageMaker documentation does not describe whether these things are allowed or not. Lets suppose I have: * a `XGBoost_model_1` (that needs a `XGBoost container`) * a `KMeans_model_1` and a `KMeans_model_2` (both require a `KMeans container`) **1.** Here's the first question - can I do the following: * create a `Model` with `InferenceExecutionConfig.Mode=Direct` and specify two cointainers (`XGBoost` and `KMeans` with `Mode: MultiModel`) That would enable the client: * to call `invoke_endpoint(TargetContainer="XGBoost")` to access the `XGBoost_model_1` * to call `invoke_endpoint(TargetContainer="KMeans", TargetModel="KMeans_model_1")` to access the `KMeans_model_1` * to call `invoke_endpoint(TargetContainer="KMeans", TargetModel="KMeans_model_2")` to access the `KMeans_model_2` I don't see a straight answer in the documentation whether combining Multi-Model containers with Multi-container endpoint is possible. **2.** The second question - how does the above idea work with `ProductionVariants`. Can I create something like this: * `Variant1` with `XGBoost` serving `XGBoost_model_1` having a weight of `0.5` * `Variant2` with a Multi-container having both `XGBoost` and `KMeans` (with a `MultiModel` setup) having a weight of `0.5` So that the client could: * call `invoke_endpoint(TargetVariant="Variant2", TargetContainer="KMeans", TargetModel="KMeans_model_1")` to access the `KMeans_model_1` * call `invoke_endpoint(TargetVariant="Variant2", TargetContainer="KMeans", TargetModel="KMeans_model_2")` to access the `KMeans_model_2` * call `invoke_endpoint(TargetVariant="Variant1")` to access the `XGBoost_model_1` * call `invoke_endpoint(TargetVariant="Variant2", TargetContainer="XGBoost")` to access the `XGBoost_model_1` Is that combination even possible? If so, what happens when the client calls the `invoke_endpoint` without specifying the variant? For example: * would `invoke_endpoint(TargetContainer="KMeans", TargetModel="KMeans_model_2")` fail 50% of the time (if it hits the right variant then it works just fine, if it hits the wrong one it would most likely result with a 400/500 error ("incorrect payload")?
1
answers
0
votes
33
views
asked 5 months ago

CDK v1: Deploy RDS SQL Server and use credential in a connection string for fargate docker instance

I have been trying to find an answer in documentation, github, here and the old forums, but I fail to find the answer to my question. In my CDK v1 (python) I create a RDS instance of SQL Server and set credentials with aws_rds.Cerednetials.from_generated_secret(), but when I later on want to provide environment/sercrets values to the docker container I want to run in fargate I have the following environment variable that I need to be set: DB_CONNECTION_STRING, which has the following syntax: Server=<somehost.aws.com>,144;Database=<databasename>; User Id=<databaseuser>;Password=<databasepassword> All the examples I have seen uses multiple variables like DB_USER/DB_PASS/DB_HOST and then you can easily set those with help of secret_value, but there is no example on generating a connection string. How do you solve this? I took a look at aws glue, but it didn't feel like the right optionand I'm not too keen on making a dockerfile to try to pull the official docker image and then create a kind of a wrapper to have environment/sercret variables and then a script that builds up the connection string and sets it before calling the start script for the application (this has other downsides). The reason why I'm not using the CDK v2 is that the current version seems to be broken when you create a new project in WSL (seems to think it's pure ms-windows and fails to create a number of files needed). Thanks in advance for any reply.
1
answers
0
votes
65
views
asked 5 months ago

Design suggestions

Hello All I am expanding the scope of the project, and wanted some suggestions/comments on if the right tech stack is being used. (Job - Pulling Leads and Activities from Marketo). Each job has a different query. **Scope**: We need 2 jobs to be run daily, however here is the catch. Each job should be created and queued first. Once done, we can poll to see if the job on Marketo side is completed and the file is downloaded. The file download can take more than 15 mins. THe file an be downloaded using the Job Id, using the 2 jobs that were created earlier. **Current/Implemented Solution**: I started with solving for Leads only and here is the stack that was worked out. The Event will be triggered, on a daily basis using event bridge. The task that is to be triggered is a step function. The Sfn, first calls a Lambda to create job, waits for 1 min, another lambda to queue the job, then wait for 10 mins and 3rd lambda to check status of file. If not ready, wait for 10 and poll again for file status(this is a loop with choice to keep checking file status). Once file is ready, call a container(fargate/ECS) and pass the Job Id as containerOverride to the container. Run the job on container to download the file and upload the file to S3. **Incorporate pulling activities into the above flow**: Since the queuing and Polling(for file status) lamba are similar, and the first lambda(creating the job) is different, I though of creating a parallel task where each branch does create, queue, poll and download the file(using the implemented solution, so 2 lambdas for job create and reuse the 2nd and 3rd lambdas). Once the complete chain(one for Leads and one for activities) is done, have a consolidation stage where the output from each container output is collected and an SNS message of job completion is send. I am looking forward to your suggestions to see if the suggested above workflow is how it should be done or is there any other technology that I should use. **Design Constraints**: I need to create and queue all the jobs first before starting the downloading since Marketo has a limit of 500mb for file download. Hence I the need to create and queue all the jobs first , and then only start the job to download of files. Thanks for your suggestions. Regards Tanmay
1
answers
0
votes
45
views
asked 6 months ago

Lightsail containers keep failing with no output

Hi, This is the first time I try to use AWS Lightsail. I use it in `us-east-1`. I setup a nano single instance and try to run two containers - one with a Go program which talks to a database and one with Nginx front-end serving a React application. For the sake of troubleshooting, I narrowed it down to just the Go program. The Docker image I created on my laptop runs fine on my laptop, it can even open the connection to the database (I made the database public while debugging). But when I push the image to Lightsail and try to deploy it, all I get is: ``` [1/Jan/2022:01:03:21] [deployment:8] Creating your deployment [1/Jan/2022:01:04:31] [deployment:8] Started 1 new node [1/Jan/2022:01:05:25] [deployment:8] Started 1 new node [1/Jan/2022:01:06:34] [deployment:8] Started 1 new node [1/Jan/2022:01:07:12] [deployment:8] Canceled ``` No matter what I try. The container's ENTRYPOINT script prints some output (for instance, it executes `env` to show the environment), but I don't see it in the logs and the Go code is verbose when it starts up but I don't see the expected output in the logs either. It also starts in about 5 seconds, including opening the connection to the database and running a few quick transactions on it, so I don't think it's a matter of slow start. I couldn't find any other output or logs from Lightsail. The container takes about 3Mb of RAM when it runs on my laptop, way less than the 512Mb available on the Nano instance. Similar issues happen with the container which runs Nginx. It's a custom container based on the official `nginx:alpine` image with my static files and extra config added to it. It runs fine on my laptop. What am I doing wrong? Where else can I look for hints on why my deployments fail? I've been banging my head over this for three days now.
1
answers
0
votes
62
views
asked 6 months ago

Multiple starts + bad timing w/ db container?

I have a deployment with a private php:7.4-apache derivative image container and a database container. When I launch my deployment, it is failing, but I don't know why. For some reason, it appears to start multiple times: ``` [28/Dec/2021:16:16:15] [deployment:33] Creating your deployment [28/Dec/2021:16:17:05] > Wrote the baseURL ([REDACTED].us-east-1.cs.amazonlightsail.com) to .env [28/Dec/2021:16:17:05] [File\Write] Writing to /var/www/fairs/.env. [28/Dec/2021:16:18:11] [deployment:33] Started 1 new node [28/Dec/2021:16:19:18] [File\Write] Writing to /var/www/fairs/.env. [28/Dec/2021:16:19:18] > Wrote the baseURL ([REDACTED].us-east-1.cs.amazonlightsail.com) to .env [28/Dec/2021:16:20:21] [deployment:33] Started 1 new node [28/Dec/2021:16:20:49] [deployment:33] Took too long ``` I also have a mysql import in my launch command that fails whenever I try to import a database snapshot: ``` [28/Dec/2021:19:23:00] ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) ``` But when I look at the logs of my db container, it doesn't appear to be ready at that exact time.: ``` [28/Dec/2021:19:22:07] [deployment:35] Creating your deployment [28/Dec/2021:19:22:59] 2021-12-28 19:22:59+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.33-1debian10 started. [28/Dec/2021:19:22:59] 2021-12-28 19:22:59+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' [28/Dec/2021:19:23:00] 2021-12-28 19:22:59+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.33-1debian10 started. [28/Dec/2021:19:23:00] 2021-12-28 19:23:00+00:00 [Note] [Entrypoint]: Initializing database files [28/Dec/2021:19:23:00] 2021-12-28T19:23:00.385802Z 0 [Warning] Changed limits: max_open_files: 1024 (requested 5000) [28/Dec/2021:19:23:00] 2021-12-28T19:23:00.385858Z 0 [Warning] Changed limits: table_open_cache: 431 (requested 2000) [28/Dec/2021:19:23:00] 2021-12-28T19:23:00.385992Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). [28/Dec/2021:19:23:00] 2021-12-28T19:23:00.791276Z 0 [Warning] InnoDB: New log files created, LSN=45790 [28/Dec/2021:19:23:00] 2021-12-28T19:23:00.843479Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. [28/Dec/2021:19:23:00] 2021-12-28T19:23:00.904067Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 9240f4d4-6813-11ec-af07-0a58a9feac02. [28/Dec/2021:19:23:00] 2021-12-28T19:23:00.906067Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. [28/Dec/2021:19:23:02] 2021-12-28T19:23:02.285861Z 0 [Warning] CA certificate ca.pem is self signed. ... [28/Dec/2021:19:23:06] 2021-12-28T19:23:06.403308Z 0 [Note] mysqld: ready for connections. [28/Dec/2021:19:23:06] Version: '5.7.33' socket: '/var/run/mysqld/mysqld.sock' port: 0 MySQL Community Server (GPL) ``` When I take out the launch command in my container, it launches fine. Is the command supposed to return something to indicate that it's done and ok? I need the launch command to configure my app and import the database.
0
answers
0
votes
11
views
asked 6 months ago

Communication Between Lightsail Containers

I want to deploy two containers which is "nginx" and "php-fpm(Laravel)". I already create and make it work on local using docker-compose. But by Lightsail it failed. Lightsail logs said php-fpm container not found which I set the name on container.yml. `nginx: [emerg] host not found in upstream "php" in /etc/nginx/conf.d/default.conf:24` The line 24 is`fastcgi_pass php:9000;`, indicating the name of php container not found. Like docker-compose, I thought it would be possible with specifying container name can communicate between each containers. So, my question is what is the right way to configure for communicate between two container. Here is the settings. nginx default.conf ``` server { listen 80 default_server; listen [::]:80 default_server; server_name _; root /app/public; add_header X-Frame-Options "SAMEORIGIN"; add_header X-XSS-Protection "1; mode=block"; add_header X-Content-Type-Options "nosniff"; index index.html index.htm index.php; charset utf-8; location / { try_files $uri $uri/ /index.php?$args; } location ~ \.php$ { try_files $uri =404; fastcgi_split_path_info ^(.+\.php)(/.+)$; fastcgi_pass php:9000; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_param PATH_INFO $fastcgi_path_info; include fastcgi_params; fastcgi_buffer_size 32k; fastcgi_buffers 4 32k; fastcgi_read_timeout 1200s; fastcgi_send_timeout 1200s; } } ``` Lightsail container.yml ``` serviceName: ${SERVICE_NAME} containers: nginx: command: [] image: ${LATEST_NGINX_LIGHTSAIL_DOCKER_IMAGE} ports: '80': HTTP php: command: [] image: ${LATEST_PHP_LIGHTSAIL_DOCKER_IMAGE} publicEndpoint: containerName: nginx containerPort: 80 healthCheck: healthyThreshold: 2 intervalSeconds: 20 path: /api/healthcheck successCodes: 200-499 timeoutSeconds: 4 unhealthyThreshold: 2 ```
2
answers
0
votes
118
views
asked 6 months ago

Should ECS/EC2 ASGProvider Capacity Provider be able to scale-up from zero, 0->1

Following from earlier thread https://repost.aws/questions/QU6QlY_u2VQGW658S8wVb0Cw/should-ecs-service-task-start-be-triggered-by-asg-capacity-0-1 , I've now attached a proper Capacity Provider, an Auto Scale Group provider to my ECS Cluster. Question TL;DR: should scaling an ECS Service 0->1 desired tasks be able to wake-up a previously scaled-to-zero ASG and have it scale 0->1 desired/running? So I've started with an ECS Service with a single task definition and Desired=1, backed by the ASG with Capacity Provider scaling - also starting with 1 Desired/InService ASG instance. I can then set the ECS Service Desired tasks to 0, and it stops the single running task, then `CapacityProviderReservation` goes from 100 to 0, and 15 minutes/sample later the Alarm is triggered, and the ASG shuts-down it's only instance, 1->0 Desired/running. If I later change the ECS Service Desired back to 1 - nothing happens, other than ECS noting that it has no capacity to place the task. Should this work? I have previously seen something similar working - `CapacityProviderReservation` jumps to 200 and an instance gets created, but this is not working for me now - that metric is stuck at 100, and no scale-up-from-zero (to one) occurs in the ASG, and the task cannot be started. Should this be expected to work? Reference blog https://aws.amazon.com/blogs/containers/deep-dive-on-amazon-ecs-cluster-auto-scaling/ suggests that `CapacityProviderReservation` should move to 200 if `M > 0 and N = 0`, but this seems to rely on a task in "Provisioning" state - will that even happen here, or is the ECS Service/Cluster giving-up and not getting that far, due to zero capacity?
2
answers
0
votes
104
views
asked 6 months ago
  • 1
  • 90 / page