By using AWS re:Post, you agree to the Terms of Use
/Amazon Elastic Container Service/

Questions tagged with Amazon Elastic Container Service

Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Unable to override taskRoleArn when running ECS task from Lambda

I have a Lambda function that is supposed to pass its own permissions to the code running in an ECS task. It looks like this: ``` ecs_parameters = { "cluster": ..., "launchType": "FARGATE", "networkConfiguration": ..., "overrides": { "taskRoleArn": boto3.client("sts").get_caller_identity().get("Arn"), ... }, "platformVersion": "LATEST", "taskDefinition": f"my-task-definition-{STAGE}", } response = ecs.run_task(**ecs_parameters) ``` When I run this in Lambda, i get this error: ``` "errorMessage": "An error occurred (ClientException) when calling the RunTask operation: ECS was unable to assume the role 'arn:aws:sts::787364832896:assumed-role/my-lambda-role...' that was provided for this task. Please verify that the role being passed has the proper trust relationship and permissions and that your IAM user has permissions to pass this role." ``` If I change the task definition in ECS to use `my-lambda-role` as the task role, it works. It's specifically when I try to override the task role from Lambda that it breaks. The Lambda role has the `AWSLambdaBasicExecutionRole` policy and also an inline policy that grants it `ecs:runTask` and `iam:PassRole`. It has a trust relationship that looks like: ``` "Effect": "Allow", "Principal": { "Service": [ "ecs.amazonaws.com", "lambda.amazonaws.com", "ecs-tasks.amazonaws.com" ] }, "Action": "sts:AssumeRole" ``` The task definition has a policy that grants it `sts:AssumeRole` and `iam:PassRole`, and a trust relationship that looks like: ``` "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com", "AWS": "arn:aws:iam::account-ID:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS" }, "Action": "sts:AssumeRole" ``` How do I allow the Lambda function to pass the role to ECS, and ECS to assume the role it's been given? P.S. - I know a lot of these permissions are overkill, so let me know if there are any I can get rid of :) Thanks!
2
answers
1
votes
8
views
AWS-User-4882383
asked 5 days ago

Architecture for multi-region ECS application

Hi everyone, I just wanted to get feedback on my proposed solution for a multi-region ECS dockerized app. Currently we have the following resources in Region A: ``` Postgres DB (Used for user accounts only) Backend+Frontend NextJS App (Dockerized) ECS Backend Microservice App for conversion of files (Dockerized) ECS Backend 3rd party API + Datastore (This resource is also deployed in other regions) Unknown architecture ``` I now need to deploy to Regions B and C. The Backend 3rd party API is already deployed in these regions. I am thinking of deploying the following resources to the following regions: ``` Backend+Frontend NextJS App (Dockerized) Backend Microservice App for conversion of files (Dockerized) ``` Our app logs in the user (authentication + authorization) using the 3rd party API, and after login we can see which region their data is in. So after login I can bounce them + their token to the appropriate region. I cannot use Route53 routing reliably because the Source of Truth about their region is available after login, and, for example, they may be (rarely) accessing from region B (if they are travelling) while their datastore is in region C (In which case I need to bounce them to region C). I also don't need to replicate our database to other regions because it only stores their account information for billing purposes, so the performance impact is minimal and only checked on login/logout. Currently we have low 10s of users, so I can easily restructure and deploy a different architecture if/when we start scaling. Critique is welcome!
1
answers
0
votes
8
views
ManavDa
asked 6 days ago

XGBoost Error: Allreduce failed - 100GB Dask Dataframe on AWS Fargate ECS cluster dies with 1T of memory.

Overview: I'm trying to run an XGboost model on a bunch of parquet files sitting in S3 using dask by setting up a fargate cluster and connecting it to a Dask cluster. Total dataframe size totals to about 140 GB of data. I scaled up a fargate cluster with properties: Workers: 40 Total threads: 160 Total memory: 1 TB So there should be enough data to hold the data tasks. Each worker has 9+ GB with 4 Threads. I do some very basic preprocessing and then I create a DaskDMatrix which does cause the task bytes per worker to get a little high, but never above the threshold where it would fail. Next I run xgb.dask.train which utilizes the xgboost package not the dask_ml.xgboost package. Very quickly, the workers die and I get the error `XGBoostError: rabit/internal/utils.h:90: Allreduce failed`. When I attempted this with a single file with only 17MB of data, I would still get this error but only a couple workers die. Does anyone know why this happens since I have double the memory of the dataframe? ``` X_train = X_train.to_dask_array() X_test = X_test.to_dask_array() y_train = y_train y_test = y_test ``` dtrain = xgb.dask.DaskDMatrix(client,X_train, y_train) output = xgb.dask.train( client, {"verbosity": 1, "tree_method": "hist", "objective": "reg:squarederror"}, dtrain, num_boost_round=100, evals=[(dtrain, "train")])`
1
answers
0
votes
5
views
AWS-User-7732475
asked 7 days ago

How can I build a CloudFormation secret out of another secret?

I have an image I deploy to ECS that expects an environment variable called `DATABASE_URL` which contains the username and password as the userinfo part of the url (e.g. `postgres://myusername:mypassword@mydb.foo.us-east-1.rds.amazonaws.com:5432/mydbname`). I cannot change the image. Using `DatabaseInstance.Builder.credentials(fromGeneratedSecret("myusername"))`, CDK creates a secret in Secrets Manager for me that has all of this information, but not as a single value: ```json { "username":"myusername", "password":"mypassword", "engine":"postgres", "host":"mydb.foo.us-east-1.rds.amazonaws.com", "port":5432, "dbInstanceIdentifier":"live-myproduct-db" } ``` Somehow I need to synthesise that `DATABASE_URL` environment variable. I don't think I can do it in the ECS Task Definition - as far as I can tell the secret can only reference a single key in a secret. I thought I might be able to add an extra `url` key to the existing secret using references in cloud formation - but I can't see how. Something like: ```java secret.newBuilder() .addTemplatedKey( "url", "postgres://#{username}:#{password}@#{host}:#{port}/#{db}" ) .build() ``` except that I just made that up... Alternatively I could use CDK to generate a new secret in either Secrets Manager or Systems Manager - but again I want to specify it as a template so that the real secret values don't get materialised in the CloudFormation template. Any thoughts? I'm hoping I'm just missing some way to use the API to build compound secrets...
3
answers
0
votes
6
views
Rob
asked 7 days ago

boto3 ecs.describe_task call returns task missing

I'm trying to use a boto3 ECS waiter to wait on a Fargate ECS task to complete. The vast majority of the time the waiter works as expected (waits for the task to reach the STOPPED status). However, sporadically the waiter will return a failure because a task is marked as missing. However, I can find the task itself in the cluster and cloudwatch logs for the task. When I first encountered this, I switched to using boto3 [`ecs.describe_tasks`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.describe_tasks) method to see if I could get more information about what was happening. When the above situation occurs, descirbe_tasks returns something like: ``` {'tasks': [], 'failures': [\{'arn': 'arn:aws:ecs:us-west-2:21234567891011:task/something-something/dsfsadfasdhfasjklhdfkdsajhf', 'reason': 'MISSING'}], 'ResponseMetadata': \{'RequestId': sdkfjaskdjfhaskdjfhasd', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'sdkjfhsdkajhfksadhkjsadf', 'content-type': 'application/x-amz-json-1.1', 'content-length': '145', 'date': 'Fri, 06 May 2022 08:36:11 GMT'}, 'RetryAttempts': 0}} ``` I've looked at the [AWS Docs ](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/api_failures_messages.html) and the none of the scenarios outlined for `reason: MISSING` apply in my circumstance. I'm passing my cluster name as an argument to the call as well. Since this happens intermittently its difficult to troubleshoot. What does the MISSING status mean? What are the reasons why an API call to check on task status would return missing when the task exists?
1
answers
0
votes
3
views
AWS-User-2535007
asked 10 days ago

My ECS tasks (VPC A) can't connect to my RDS (VPC B) even though the VPCs are peered and networking is configured correctly

Hi, As mentioned in the question, my ECS tasks cannot connect to my RDS. The ECS tasks try to resolve the rds by name, and it resolves to the RDS public IP (RDS has public and private IPs). However, the security group on RDS doesn't allow open access from all IPs so the connection fails. I temporarily allowed all connections and could see that the ECS tasks are routing through the open internet to access the RDS. Reachability Analyzer checking specific tasks' Elastic Network Interface to the RDI ENI is successful, using internal routing through the peering connection. At the same time I have another server on VPC C that can connect to the RDS. All the config is similar between these two apps, including the peering connection, security group policies and routing tables. Any help is appreciated Here are some details about the VPCs VPC A - 15.2.0.0/16 [three subnets] VPC B - 111.30.0.0/16 [three subnets] VPC C - 15.0.0.0/16 [three subnets] Peering Connection 1 between A and B Peering Connection 2 between C and B Route table for VPC A: 111.30.0.0/16 : Peering Connection 1 15.2.0.0/16: Local 0.0.0.0/0: Internet Gateway Route table for VPC C: 111.30.0.0/16: Peering Connection 2 15.2.0.0/16: Local 0.0.0.0/0: Internet Gateway Security groups allow traffic to RDS: Ingress: 15.0.0.0/16: Allow DB Port 15.2.0.0/16: Allow DB Port Egress: 0.0.0.0/0: Allow all ports When I add the rule: 0.0.0.0/0 Allow DB Port to the RDS, then ECS can connect to my RDS through its public IP.
1
answers
2
votes
5
views
ManavDa
asked 11 days ago

ApplicationLoadBalancedFargateService with load balancer, target groups, targets on non-standard port

I have an ECS service that exposes port 8080. I want to have the load balancer, target groups and target use that port as opposed to port 80. Here is a snippet of my code: ``` const servicePort = 8888; const metricsPort = 8888; const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef'); const repository = ecr.Repository.fromRepositoryName(this, 'cloud-config-server', 'cloud-config-server'); taskDefinition.addContainer('Config', { image: ecs.ContainerImage.fromEcrRepository(repository), portMappings: [{containerPort : servicePort, hostPort: servicePort}], }); const albFargateService = new ecsPatterns.ApplicationLoadBalancedFargateService(this, 'AlbConfigService', { cluster, publicLoadBalancer : false, taskDefinition: taskDefinition, desiredCount: 1, }); const applicationTargetGroup = new elbv2.ApplicationTargetGroup(this, 'AlbConfigServiceTargetGroup', { targetType: elbv2.TargetType.IP, protocol: elbv2.ApplicationProtocol.HTTP, port: servicePort, vpc, healthCheck: {path: "/CloudConfigServer/actuator/env/profile", port: String(servicePort)} }); const addApplicationTargetGroupsProps: elbv2.AddApplicationTargetGroupsProps = { targetGroups: [applicationTargetGroup], }; albFargateService.loadBalancer.addListener('alb-listener', { protocol: elbv2.ApplicationProtocol.HTTP, port: servicePort, defaultTargetGroups: [applicationTargetGroup]} ); } } ``` This does not work. The health check is taking place on port 80 with the default URL of "/" which fails, and the tasks are constantly recycled. A target group on port 8080, with the appropriate health check, is added, but it has no targets. What is the recommended way to achieve load balancing on a port other than 80? thanks
1
answers
0
votes
5
views
AWS-User-9720008
asked 12 days ago
0
answers
0
votes
1
views
evan
asked 14 days ago
1
answers
0
votes
5
views
gs-scooter
asked 14 days ago

ECS EC2 Instance is not register to target group

I create a ECS service using EC2 instances, then i create an Application Load Balancer and a target group, my docker image the task definition its using follow configuration: ```json { "ipcMode": null, "executionRoleArn": null, "containerDefinitions": [ { "dnsSearchDomains": null, "environmentFiles": null, "logConfiguration": { "logDriver": "awslogs", "secretOptions": null, "options": { "awslogs-group": "/ecs/onestapp-task-prod", "awslogs-region": "us-east-2", "awslogs-stream-prefix": "ecs" } }, "entryPoint": null, "portMappings": [ { "hostPort": 0, "protocol": "tcp", "containerPort": 80 } ], "cpu": 0, "resourceRequirements": null, "ulimits": null, "dnsServers": null, "mountPoints": [], "workingDirectory": null, "secrets": null, "dockerSecurityOptions": null, "memory": null, "memoryReservation": 512, "volumesFrom": [], "stopTimeout": null, "image": "637960118793.dkr.ecr.us-east-2.amazonaws.com/onestapp-repository-prod:5ea9baa2a6165a91c97aee3c037b593f708b33e7", "startTimeout": null, "firelensConfiguration": null, "dependsOn": null, "disableNetworking": null, "interactive": null, "healthCheck": null, "essential": true, "links": null, "hostname": null, "extraHosts": null, "pseudoTerminal": null, "user": null, "readonlyRootFilesystem": false, "dockerLabels": null, "systemControls": null, "privileged": null, "name": "onestapp-container-prod" } ], "placementConstraints": [], "memory": "1024", "taskRoleArn": null, "compatibilities": [ "EXTERNAL", "EC2" ], "taskDefinitionArn": "arn:aws:ecs:us-east-2:637960118793:task-definition/onestapp-task-prod:25", "networkMode": null, "runtimePlatform": null, "cpu": "1024", "revision": 25, "status": "ACTIVE", "inferenceAccelerators": null, "proxyConfiguration": null, "volumes": [] } ``` The service its using ALB and its using same Target Group as ALB, my task its running, and i can access using public ip from instance, but the target group does not have registered my tasks.
0
answers
0
votes
2
views
AWS-User-9232552
asked 14 days ago

Fail to start an EC2 task on ECS

Hi there i am trying to start a task which uses gpu on my instance. EC2 is already added to a cluster but it failed to start, here is the error: ``` status: STOPPED (CannotStartContainerError: Error response from dae) Details Status reason CannotStartContainerError: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr Network bindings - not configured ``` ec2: setup ``` Type: AWS::EC2::Instance Properties: IamInstanceProfile: !Ref InstanceProfile ImageId: ami-0d5564ca7e0b414a9 InstanceType: g4dn.xlarge KeyName: tmp-key SubnetId: !Ref PrivateSubnetOne SecurityGroupIds: - !Ref ContainerSecurityGroup UserData: Fn::Base64: !Sub | #!/bin/bash echo ECS_CLUSTER=traffic-data-cluster >> /etc/ecs/ecs.config echo ECS_ENABLED_GPU_SUPPORT=true >> /etc/ecs/ecs.config ``` Dockerfile ``` FROM nvidia/cuda:11.6.0-base-ubuntu20.04 ENV NVIDIA_VISIBLE_DEVICES all ENV NVIDIA_DRIVER_CAPABILITIES compute,utility # RUN nvidia-smi RUN echo 'install pip packages' RUN apt-get update RUN apt-get install python3.8 -y RUN apt-get install python3-pip -y RUN ln -s /usr/bin/python3 /usr/bin/python RUN pip3 --version RUN python --version WORKDIR / COPY deployment/video-blurring/requirements.txt /requirements.txt RUN pip3 install --upgrade pip RUN pip3 install --user -r /requirements.txt ## Set up the requisite environment variables that will be passed during the build stage ARG SERVER_ID ARG SERVERLESS_STAGE ARG SERVERLESS_REGION ENV SERVER_ID=$SERVER_ID ENV SERVERLESS_STAGE=$SERVERLESS_STAGE ENV SERVERLESS_REGION=$SERVERLESS_REGION COPY config/env-vars . ## Sets up the entry point for running the bashrc which contains environment variable and ## trigger the python task handler COPY script/*.sh / RUN ["chmod", "+x", "./initialise_task.sh"] ## Copy the code to /var/runtime - following the AWS lambda convention ## Use ADD to preserve the underlying directory structure ADD src /var/runtime/ ENTRYPOINT ./initialise_task.sh ```
0
answers
0
votes
2
views
AWS-User-6797102
asked 20 days ago

Container cannot bind to port 80 running as non-root user on ECS Fargate

I have an image that binds to port 80 as a **non-root** user. I can run it locally (macOS Monterey, Docker Desktop 4.7.1) absolutely fine. When I try and run it as part of an ECS service on Fargate it fails as so: **Failed to bind to 0.0.0.0/0.0.0.0:80** **caused by SocketException: Permission denied** Fargate means I have to run the task in network mode `awsvpc` - not sure if that's related? Any views on what I'm doing wrong? The [best practices document](https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/bestpracticesguide.pdf) suggests that I should be running as non-root (p.83) and that under awsvpc it's reasonable to expose port 80 (diagram on p.23). FWIW here's a mildly cut down version the JSON from my task definition: ``` { "taskDefinitionArn": "arn:aws:ecs:us-east-1:<ID>:task-definition/mything:2", "containerDefinitions": [ { "name": "mything", "image": "mything:latest", "cpu": 0, "memory": 1024, "portMappings": [ { "containerPort": 80, "hostPort": 80, "protocol": "tcp" } ], "essential": true, "environment": [] } ], "family": "mything", "executionRoleArn": "arn:aws:iam::<ID>:role/ecsTaskExecutionRole", "networkMode": "awsvpc", "revision": 2, "volumes": [], "status": "ACTIVE", "requiresAttributes": [ { "name": "com.amazonaws.ecs.capability.logging-driver.awslogs" }, { "name": "ecs.capability.execution-role-awslogs" }, { "name": "com.amazonaws.ecs.capability.ecr-auth" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19" }, { "name": "ecs.capability.execution-role-ecr-pull" }, { "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18" }, { "name": "ecs.capability.task-eni" } ], "placementConstraints": [], "compatibilities": [ "EC2", "FARGATE" ], "runtimePlatform": { "operatingSystemFamily": "LINUX" }, "requiresCompatibilities": [ "FARGATE" ], "cpu": "256", "memory": "1024", "tags": [] } ```
2
answers
0
votes
7
views
Rob
asked 25 days ago

Scheduled Action triggering at time specified in another action

I have a CloudFormation setup with Scheduled Actions to autoscale services based on times. There is one action that scales up to start the service, and another to scale down to turn it off. I also occasionally add an additional action to scale up if a service is needed at a different time on a particular day. I'm having an issue where my service is being scaled down instead of up when I specify this additional action. Looking at the console logs I get an event that looks like: ``` 16:00:00 -0400 Message: Successfully set min capacity to 0 and max capacity to 0 Cause: scheduled action name ScheduleScaling_action_1 was triggered ``` However the relevant part of the CloudFormation Template for the Scheduled Action with the name in the log has a different time, e.g.: ``` { "ScalableTargetAction": { "MaxCapacity": 0, "MinCapacity": 0 }, "Schedule": "cron(0 5 ? * 2-5 *)", "ScheduledActionName": "ScheduleScaling_action_1" } ``` What is odd is that the time this action is triggering matches exactly with the Schedule time for another action. E.g. ``` { "ScalableTargetAction": { "MaxCapacity": 1, "MinCapacity": 1 }, "Schedule": "cron(00 20 ? * 2-5 *)", "ScheduledActionName": "ScheduleScaling_action_2" } ``` I am using CDK to generate the CloudFormation template, which doesn't appear to allow me to specify a timezone. So my understanding is that the times here should be UTC. What could cause the scheduled action to trigger at the incorrect time like this?
1
answers
0
votes
4
views
Jacques
asked a month ago

App Runner actions work very slow (2-10 minutes) and deployer provides incorrect error message

App Runner actions work very slow for me. create/pause/resume may take 2-5 minutes for simple demo image (`public.ecr.aws/aws-containers/hello-app-runner:latest`) and create-service when image not found takes ~10 minutes: example #1 - 5 min to deploy hello-app image ``` 04-17-2022 05:59:55 PM [AppRunner] Service status is set to RUNNING. 04-17-2022 05:59:55 PM [AppRunner] Deployment completed successfully. 04-17-2022 05:59:44 PM [AppRunner] Successfully routed incoming traffic to application. 04-17-2022 05:58:33 PM [AppRunner] Health check is successful. Routing traffic to application. 04-17-2022 05:57:01 PM [AppRunner] Performing health check on port '8000'. 04-17-2022 05:56:51 PM [AppRunner] Provisioning instances and deploying image. 04-17-2022 05:56:42 PM [AppRunner] Successfully pulled image from ECR. 04-17-2022 05:54:56 PM [AppRunner] Service status is set to OPERATION_IN_PROGRESS. 04-17-2022 05:54:55 PM [AppRunner] Deployment started. ``` example #2 - 10 min when image not found ``` 04-17-2022 05:35:41 PM [AppRunner] Failed to pull your application image. Be sure you configure your service with a valid access role to your ECR repository. 04-17-2022 05:25:47 PM [AppRunner] Starting to pull your application image. ``` example #3 - 10 min when image not found ``` 04-17-2022 06:46:24 PM [AppRunner] Failed to pull your application image. Be sure you configure your service with a valid access role to your ECR repository. 04-17-2022 06:36:31 PM [AppRunner] Starting to pull your application image. ``` but 404 error should be detected immediately and fail much faster. because no need to retry 404 many times for 10 min, right? additionally the error message `Failed to pull your application image. Be sure you configure your service with a valid access role to your ECR repository` is very confusing. it doesn't show image name and doesn't provide the actual cause. 404 is not related to access errors like 401 or 403, correct? can App Runner actions performance and error message be improved?
0
answers
0
votes
2
views
AWS-User-Mike
asked a month ago

ECS Exec error TargetNotConnectedException

I'm getting this error when I try to connect to my Fargate ECS container using ECS Exec feature: ``` An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later. ``` The task's role was missing the SSM permissions, I added them, but the error persists. Below is the output of the [amazon-ecs-exec-checker](https://github.com/aws-containers/amazon-ecs-exec-checker) tool: ``` Prerequisites for check-ecs-exec.sh v0.7 ------------------------------------------------------------- jq | OK (/usr/bin/jq) AWS CLI | OK (/usr/local/bin/aws) ------------------------------------------------------------- Prerequisites for the AWS CLI to use ECS Exec ------------------------------------------------------------- AWS CLI Version | OK (aws-cli/2.2.1 Python/3.8.8 Linux/5.13.0-39-generic exe/x86_64.ubuntu.20 prompt/off) Session Manager Plugin | OK (1.2.295.0) ------------------------------------------------------------- Checks on ECS task and other resources ------------------------------------------------------------- Region : us-west-2 Cluster: removed Task : removed ------------------------------------------------------------- Cluster Configuration | Audit Logging Not Configured Can I ExecuteCommand? | arn:aws:iam::removed ecs:ExecuteCommand: allowed ssm:StartSession denied?: allowed Task Status | RUNNING Launch Type | Fargate Platform Version | 1.4.0 Exec Enabled for Task | OK Container-Level Checks | ---------- Managed Agent Status ---------- 1. RUNNING for "api-staging" ---------- Init Process Enabled (api-staging:14) ---------- 1. Enabled - "api-staging" ---------- Read-Only Root Filesystem (api-staging:14) ---------- 1. Disabled - "api-staging" Task Role Permissions | arn:aws:iam::removed ssmmessages:CreateControlChannel: allowed ssmmessages:CreateDataChannel: allowed ssmmessages:OpenControlChannel: allowed ssmmessages:OpenDataChannel: allowed VPC Endpoints | Found existing endpoints for vpc-removed: - com.amazonaws.us-west-2.s3 SSM PrivateLink "com.amazonaws.us-west-2.ssmmessages" not found. You must ensure your task has proper outbound internet connectivity. Environment Variables | (api-staging:14) 1. container "api-staging" - AWS_ACCESS_KEY: not defined - AWS_ACCESS_KEY_ID: defined - AWS_SECRET_ACCESS_KEY: defined ``` The output has only green and yellow markers, no red ones, which means Exec should work. But it doesn't. Any ideas why?
1
answers
0
votes
4
views
AWS-User-3395749
asked a month ago

How do I set up an AWS Amplify project to query an existing AWS AppSync API?

Hi, I am new to AWS Amplify and would like guidance on how to send a query to an ***existing*** GraphQL API on AWS AppSync. I am unsure how to start as a lot of Amplify coverage creates a *new* AppSync API using the Amplify CLI. ## Objectives * Set up a Node.js project to work with an existing AWS AppSync API, using AWS Amplify as the GraphQL client. * Send a single query to an existing AWS AppSync API. The query lists game results from a DynamoDB table and is called `listGames` in my GraphQL schema. * I need to repeat the query in order to fetch all available database records that satisfy the query. This would mean adding results to an array/object until the `nextToken` is `null` (i.e. no more records can be found for the query). ## Context * This application is deployed in an Amazon ECS container using AWS Fargate. * The ECS service is fronted by an Application Load Balancer (ALB). * A leader board web page fetches game results through a `POST` request to the ALB's DNS name / URL and adds them to a HTML table. ## Notes * For now, API key is my authentication method. I would soon like to switch to a task IAM role in ECS. * The ECS deployment described in 'Context' is working but it sends `POST` requests without AWS libraries. It is my understanding that I would need to use an AWS library in order to use an IAM role for AppSync authentication (used as a [task IAM role in ECS](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html)). Please correct me if I am mistaken. I would greatly appreciate any help you can give me. Thank you for your time!
1
answers
1
votes
4
views
Toby
asked a month ago

Step functions pass input into Fargate instance :

by looking at this AWS document: https://docs.aws.amazon.com/step-functions/latest/dg/sample-project-container-task-notification.html step functions can totally instantiate/manage fargate or ECS Task. Speaking about fargate * the example put fargate as first step. suppose fargate is the 2nd step, can we get the output message from first step (larger than 8K) into fargate task through either .sync or .waitForTaskToken model? Instead of passing input parameters through fargate containerOverride's command or commands? * If above is possible, does the code inside fargate suppose to do something like : GetActivityTaskResult getActivityTaskResult = client.getActivityTask(new GetActivityTaskRequest().withActivityArn(stringActivityArn)); String taskToken = getActivityTaskResult.getTaskToken(); String taskInput = getActivityTaskResult.getInput(); * In this step function instantiate fargate model (.sync or .waitForTaskToken), does fargate instances gets created each time when step functions "calls" it? That is, we actually would like to set desire_count and min count both to zero and forget about scaling alarms or metrics. If above is true, does it respect the max_count? I guess cost-wise, it is fine with us because how many concurrent fargate running does not matter anymore. Cost is based on usage. * is ECS or fargate able to return output values to the calling step function? that is, using sendSucess or some other api to send output back to step function for next step to use (become the input of next step)?
1
answers
0
votes
8
views
DavidYen
asked 2 months ago

CannotPullContainerError in public VPC/Subnet. What am I missing/doing wrong?

I have created a brand new AWS account (just to troubleshoot this issue) and the default VPC and subnets of every region in this account are left pristine and unmodified. Here's the default VPC in `us-east-1`: $ aws ec2 describe-vpcs { "Vpcs": [ { "CidrBlock": "172.31.0.0/16", "DhcpOptionsId": "dopt-095a7873b289557a1", "State": "available", "VpcId": "vpc-08ba51697a37c5ad9", "OwnerId": "...", "InstanceTenancy": "default", "CidrBlockAssociationSet": [ { "AssociationId": "vpc-cidr-assoc-0dba5df7b176877b7", "CidrBlock": "172.31.0.0/16", "CidrBlockState": { "State": "associated" } } ], "IsDefault": true } ] } Here's the route table for this VPC: $ aws ec2 describe-route-tables --filters Name=vpc-id,Values=vpc-08ba51697a37c5ad9 { "RouteTables": [ { "Associations": [ { "Main": true, "RouteTableAssociationId": "rtbassoc-08e6f9833f341f6c4", "RouteTableId": "rtb-000d61d5d0236d276", "AssociationState": { "State": "associated" } } ], "PropagatingVgws": [], "RouteTableId": "rtb-000d61d5d0236d276", "Routes": [ { "DestinationCidrBlock": "172.31.0.0/16", "GatewayId": "local", "Origin": "CreateRouteTable", "State": "active" }, { "DestinationCidrBlock": "0.0.0.0/0", "GatewayId": "igw-0b7ed209f5cd38fa6", "Origin": "CreateRoute", "State": "active" } ], "Tags": [], "VpcId": "vpc-08ba51697a37c5ad9", "OwnerId": "..." } ] } As you can see, the second route permits egress to the internet: { "DestinationCidrBlock": "0.0.0.0/0", "GatewayId": "igw-0b7ed209f5cd38fa6", "Origin": "CreateRoute", "State": "active" } So I assume if I deploy an ECS Fargate task in this VPC, it should be able to pull `amazoncorretto:17-alpine3.15` from `docker.io`. Despite that, whenever I deploy my CloudFormation stack, ECS fails to run the scheduled task as it cannot fetch the images from DockerHub and prints this error: > CannotPullContainerError: > inspect image has been retried 5 time(s): > failed to resolve ref "docker.io/library/amazoncorretto:17-alpine3.15": > failed to do request: Head https://registry-1.docker.io/v2/library/amazoncorretto/manifests/17-alpine3.15: dial ... Here's my CloudFormation template (I have intentionally given wide open permissions to all the roles involved to make sure this issue is not due to insufficient IAM permissions): AWSTemplateFormatVersion: "2010-09-09" Description: ECS Cron Task Parameters: AppName: Type: String Default: CronTask AppImage: Type: String Default: amazoncorretto:17-alpine3.15 AppLogGroup: Type: String Default: ECS AppLogPrefix: Type: String Default: CronTask ScheduledTaskSubnets: Type: List<AWS::EC2::Subnet::Id> Default: "subnet-0031a6eaf7e52173c, subnet-01950a0d2d1e04dc1, subnet-0a1aa70f0421e2025, subnet-036abb95995a86c73, subnet-0f8b5043babfb9a7e, subnet-07cb2210ce2d5bb8f" Resources: Cluster: Type: AWS::ECS::Cluster TaskRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Action: sts:AssumeRole Effect: Allow Principal: Service: ecs-tasks.amazonaws.com Policies: - PolicyName: AdminAccess PolicyDocument: Version: "2012-10-17" Statement: - Action: "*" Effect: Allow Resource: "*" TaskExecutionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Action: sts:AssumeRole Effect: Allow Principal: Service: ecs-tasks.amazonaws.com Path: / ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy Policies: - PolicyName: AdminAccess PolicyDocument: Version: "2012-10-17" Statement: - Action: "*" Effect: Allow Resource: "*" TaskScheduleRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Statement: - Action: sts:AssumeRole Effect: Allow Principal: Service: events.amazonaws.com Path: / Policies: - PolicyName: AdminAccess PolicyDocument: Statement: - Action: "*" Effect: Allow Resource: "*" TaskDefinition: Type: AWS::ECS::TaskDefinition Properties: Cpu: 256 Memory: 512 NetworkMode: awsvpc TaskRoleArn: !Ref TaskRole ExecutionRoleArn: !Ref TaskExecutionRole Family: !Ref AppName RequiresCompatibilities: - FARGATE ContainerDefinitions: - Name: !Ref AppName Image: !Ref AppImage Command: ["java", "--version"] Essential: true LogConfiguration: LogDriver: awslogs Options: awslogs-create-group: true awslogs-group: !Ref AppLogGroup awslogs-region: !Ref "AWS::Region" awslogs-stream-prefix: !Ref AppLogPrefix TaskSchedule: Type: AWS::Events::Rule DependsOn: - TaskScheduleRole - DeadLetterQueue Properties: Description: Trigger the task once every minute ScheduleExpression: cron(0/1 * * * ? *) State: ENABLED Targets: - Arn: !GetAtt Cluster.Arn Id: ClusterTarget RoleArn: !GetAtt TaskScheduleRole.Arn DeadLetterConfig: Arn: !GetAtt DeadLetterQueue.Arn EcsParameters: LaunchType: FARGATE TaskCount: 1 TaskDefinitionArn: !Ref TaskDefinition NetworkConfiguration: AwsVpcConfiguration: Subnets: !Ref ScheduledTaskSubnets DeadLetterQueue: Type: AWS::SQS::Queue Properties: QueueName: "CronTaskDeadLetterQueue" DeadLetterQueuePolicy: Type: AWS::SQS::QueuePolicy Properties: Queues: - !Ref DeadLetterQueue PolicyDocument: Statement: - Action: "*" Effect: Allow Resource: "*" What am I missing here? Why despite running the task in a public subnet/VPC (below), AWS cannot pull the image from `docker.io`? Is something missing in my `TaskSchedule` resource? TaskSchedule: Type: AWS::Events::Rule ... Properties: ... Targets: - ... EcsParameters: LaunchType: FARGATE TaskCount: 1 TaskDefinitionArn: !Ref TaskDefinition NetworkConfiguration: AwsVpcConfiguration: Subnets: !Ref ScheduledTaskSubnets Thanks in advance.
1
answers
0
votes
10
views
MobyDick
asked 3 months ago

Overriding Hostname on ECS Fargate

Hello, I am setting up a Yellowfin [deployment](https://wiki.yellowfinbi.com/display/yfcurrent/Install+in+a+Container) using their stock app-only [image](https://hub.docker.com/r/yellowfinbi/yellowfin-app-only) on ECS Fargate. I was able to set up a test cluster for my team to experiment with. Yellowfin requires a license to use their software. To issue a license, Yellowfin needs to know the hostname of the underlying platform it runs on. Yellowfin can provide wildcard licenses that match on a standard prefix or suffix. Currently, we are using a development license that matches on the default hostname that our test environment's Fargate task is assigned. The default hostname seems to be of the form <32-character alphanumeric string>-<10-digit number> where the former is the running task's ID and the latter is an ID of some other related AWS resource (the task definition? the cluster?) that I could not identify. Although this 10-digit number stays constant if new tasks are run, it does not seem like a good strategy to base the real Yellowfin license off of it. I would like to override the hostname of a Fargate task when launching the container to include a common prefix (e.g., "myorg-yfbi-") to make it simple to request a wildcard license for actual use. If possible, I would like to avoid building my own image or migrating to use another AWS service. Is there a standard way to override the hostname for a Fargate service by solely updating the task definition? Would overriding the entrypoint be a viable option? Is there another way to set hostname in Fargate that I am not aware of? Thank you for any guidance you can provide. Happy to provide more information if it helps.
1
answers
0
votes
6
views
AWS-User-4360608
asked 3 months ago

Create ECS service using existing load balancer with existing target group

I'm using the AWS console to create an ECS service (using fargate) in an existing cluster. In the second step of the wizard (configure network) I choose an existing application load balancer. The "container to load-balance" section shows my container to add. When I click "add to load balancer" it initially shows the "production listener port" and "target group name" dropdowns showing "create new". When I select an existing target group in the dropdown this grays out (disables) the "production listener port" dropdown. When clicking the "next step" button validation complains the "production listener port" is not filled in (validation message: "please select a listener"). Which isn't possible because the control is disabled. First selecting a listener port in the wizard and switching to an existing target group after that doesn't remedy the situation as choosing an existing target group blanks out and disables the "production listener port" dropdown causing the same problem when clicking the "next step" button. How to get the container to register in an existing target group? **Update** The target group is an empty IP address group. Interestingly it does work if the target group is not empty (the "production listener port" is then filled with 443:HTTPS) but an empty target group (even of the correct type) clears the "production listener port". **Reproduction** 1. Create a new target group of type "IP address" leaving other default settings. Do not register any targets in this group. 2. Next add this target group to a (new or existing) load balancer (for testing I added the group to an existing load balancer with a single source IP address filter e.g. 192.168.1.1/32 so it doesn't disrupt normal operations). 3. When creating a new ECS service select a task definition that has a container with an exposed port 80. 4. On the second screen "Configure Network" choose the VPC and subnets and under "Load balancing" pick "Application load balancer". 5. Select the load balancer to which you added the empty target group. 6. Now click "Add to load balancer" next to the container. This shows the "Production listener port" and "Target group name" dropdowns both initially set to "Create new". 7. Choosing the empty target group disables the "Production listener port" dropdown and clears it. 8. If the target group is not empty the "Production listener port" is automatically filled with the correct port.
1
answers
1
votes
3
views
Bjorn van Dommelen
asked 3 months ago

AWS Credentials not passed in ecs task with dynamodb nodejs

Hi, There is a weird problem statement. So I have a ECS Task attached with RoleName called `MyRole` . This ECS Tasks has a nodejs script which queries a dynamodb Table. Nodejs script: ```js const AWS = require('aws-sdk'); async function GetCredentials(){ var docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-2'}); var identifier = "test-asassa"; var params3 = { TableName: 'MyTable', ExpressionAttributeNames: { '#var1': 'var1' }, KeyConditionExpression: '#var1 = :var1', ExpressionAttributeValues: { ':var1': 'ValueToBeQueried' } } await docClient.query(params3,( () => { })).promise().then(async (data1) => { console.log(data1); }); }; GetCredentials().then(data => { console.log(data); }); ``` This code when ran simply in docker container doesnt fetch the AWS credentials from Role and errors out saying `ValidationException`. At same time, if I export the credentials as `AWS_ACCESS_KEY_ID` , `AWS_SECRET_ACCESS_KEY` and `AWS_SESSION_TOKEN`, the script works. Can anyone help me understand wth is going on here? How do i make sure code takes the role attached to the ECS Task? Trust relationship and all other misc are correct. The query is also correct. To reproduce this, you can create a table called "MyTable" with primary key as "var1" . Now use the same code as shared above and in the same terminal, export the AWS access keys. The code will work. Now, in any ecs task, attach iam role to the task with sufficient permissions and it will not work and error out saying `ValidationException: One or more parameter values were invalid: Condition parameter type does not match schema type` which is completely mis-leading error.
4
answers
0
votes
9
views
AWS-User-9549462
asked 3 months ago

How do I make an ECS cluster spawn GPU instances with more root volume than default?

I need to deploy an ML app that needs GPU access for its response times to be acceptable (since it uses some heavy networks that run too slowly on CPU). The app is containerized and uses an nvidia/cuda base image, so that it can make use of its host machine's GPU. The image alone weighs ~10GB, and during startup it pulls several ML models and data which takes up about another ~10GB of disk. We were previously running this app on Elastic Beanstalk, but we realized it doesn't support GPU usage, even if specifying a Deep Learning AMI, so we migrated to ECS, which provides more configurability that the former. However, we soon ran into a new problem: **selecting a g4dn instance type when creating a cluster, which defaults the AMI to an ECS GPU one, turns the Root EBS Volume Size field into a Data EBS Volume Size field.** This causes the instance's 22GB root volume (which is the only one that comes formatted and mounted) to be too small for pulling our image and downloading the data it needs during startup. The other volume (of whatever size I specify during creation in the new Data EBS Volume Size field) is not mounted and therefore not accessible by the container. Additionally, the g4dn instances come with a 125GB SSD, that is not mounted either. If either of these were usable or it was possible to enlarge the root volume (which it is if using the default non-GPU AMI) ECS would be the perfect solution for us at this time. At the moment, we worked around this issue by creating an *empty* cluster in ECS, and the manually creating and attaching an Auto Scaling group to it, since when using a Launch configuration or template the root volume's size can be correctly specified, even if using the same exact ECS GPU AMI as ECS does. However, this is a tiresome process, and makes us lose valuable ECS functionality such as automatically spawning a new instance during a rolling update to maintain capacity. Am I missing something here? Is this a bug that will be fixed at some point? If its not, is there a simpler way to achieve what I need? Maybe by specifying a custom launch configuration to the ECS cluster or by automatically mounting the SSD on instance launch? Any help is more than appreciated. Thanks in advance!
0
answers
0
votes
4
views
Ian
asked 3 months ago

AWS CodeDeploy: STRING_VALUE can not be converted to an Integer

Using AWS CodePipeline and setting a Source, Build and passing `taskdef.json` and `appspec.yaml` as artifacts, the deployment action `Amazon ECS (Blue/Green)` will fail with the error: STRING_VALUE can not be converted to an Integer This error does not specify where this error happens and therefore it is not possible to fix. For reference, the files look like this: ```yaml # appspec.yaml version: 0.0 Resources: - TargetService: Type: AWS::ECS::Service Properties: TaskDefinition: <TASK_DEFINITION> LoadBalancerInfo: ContainerName: "my-project" ContainerPort: 3000 ``` ```json // taskdef.json { "family": "my-project-web", "taskRoleArn": "arn:aws:iam::1234567890:role/ecsTaskRole-role", "executionRoleArn": "arn:aws:iam::1234567890:role/ecsTaskExecutionRole-web", "networkMode": "awsvpc", "cpu": "256", "memory": "512", "containerDefinitions": [ { "name": "my-project", "memory": "512", "image": "01234567890.dkr.ecr.us-east-1.amazonaws.com/my-project:a09b7d81", "environment": [], "secrets": [ { "name": "APP_ENV", "valueFrom": "arn:aws:secretsmanager:us-east-1:1234567890:secret:web/my-project-NBcsLj:APP_ENV::" }, { "name": "PORT", "valueFrom": "arn:aws:secretsmanager:us-east-1:1234567890:secret:web/my-project-NBcsLj:PORT::" }, { "name": "APP_NAME", "valueFrom": "arn:aws:secretsmanager:us-east-1:1234567890:secret:web/my-project-NBcsLj:APP_NAME::" }, { "name": "LOG_CHANNEL", "valueFrom": "arn:aws:secretsmanager:us-east-1:1234567890:secret:web/my-project-NBcsLj:LOG_CHANNEL::" }, { "name": "APP_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:1234567890:secret:web/my-project-NBcsLj:APP_KEY::" }, { "name": "APP_DEBUG", "valueFrom": "arn:aws:secretsmanager:us-east-1:1234567890:secret:web/my-project-NBcsLj:APP_DEBUG::" } ], "essential": true, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "", "awslogs-region": "", "awslogs-stream-prefix": "" } }, "portMappings": [ { "hostPort": 3000, "protocol": "tcp", "containerPort": 3000 } ], "entryPoint": [ "web" ], "command": [] } ], "requiresCompatibilities": [ "FARGATE", "EC2" ], "tags": [ { "key": "project", "value": "my-project" } ] } ``` Any insights on this issue are highly appreciated!
2
answers
0
votes
6
views
fagiani
asked 4 months ago

ECS - FSx FileSystemNotFound: File system does not exist

I have an ECS service which is of Launch Type EC2 owned by an AWS account A. Our IT team has created an FSx storage owned by an AWS Account B - [see simple diagram here](https://i.stack.imgur.com/MyU1d.png) When I try to launch tasks I get this error in the Stopped reason section of the task: ``` Stopped reason Fsx describing filesystem(s) from the service for [fs-0c52aba0aac20c744]: FileSystemNotFound: File system 'fs-0c52aba0aac20c744' does not exist. ``` I have attached those 2 policies to the EC2 (container host) instance: - AmazonFSxReadOnlyAccess (AWS Managed) - fsx_mount (Customer Managed) fsx_mount: ``` { "Statement": [ { "Action": [ "secretsmanager:GetSecretValue" ], "Effect": "Allow", "Resource": "arn:aws:secretsmanager:us-west-2:111111111111:secret:dev/rushmore/ad-account-NKOkyh" }, { "Action": [ "fsx:*", "ds:DescribeDirectories" ], "Effect": "Allow", "Resource": "arn:aws:fsx:us-west-2:222222222222:file-system/fs-0c52aba0aac20c744" } ], "Version": "2012-10-17" } ``` **Note** that the account id of 222222222222 represents AWS Account B. Also, **VPC Peering is in place between the EC2 instance VPC and the FileSystem VPC**. Terraform aws_ecs_task_definition: ``` resource "aws_ecs_task_definition" "participants_task" { volume { name = "FSxStorage" fsx_windows_file_server_volume_configuration { file_system_id = "fs-0c52aba0aac20c744" root_directory = "\\data" authorization_config { credentials_parameter = aws_secretsmanager_secret_version.fsx_account_secret.arn domain = var.domain } } } ... } ``` I am not sure why ECS cannot find the FSx file system. Surely it must be because it is in another AWS account but I don't know what changes are required in order to fix this.
1
answers
0
votes
4
views
AWS-User-3510110
asked 4 months ago
  • 1
  • 90 / page