By using AWS re:Post, you agree to the Terms of Use
/DevOps/Questions/
Questions in DevOps
Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

CDK Route 53 zone lookup brings back wrong zone ID

We are attempt to update our IaC code base to CDK v2. Prior to that we're deploy entire stacks of our system in another test environment. One part of a stack creates a TLS certificate for use with our load balancer. ``` var hostedZone = HostedZone.FromLookup(this, $"{config.ProductName}-dns-zone", new HostedZoneProviderProps { DomainName = config.RootDomainName }); DnsValidatedCertificate certificate = new DnsValidatedCertificate(this, $"{config.ProductName}-webELBCertificate-{config.Environment}", new DnsValidatedCertificateProps { HostedZone = hostedZone, DomainName = config.AppDomainName, // Used to implement ValidationMethod = ValidationMethod.DNS Validation = CertificateValidation.FromDns(hostedZone) }); ``` For some reason, the synthesised template defines the hosted zone ID for that AWS::CloudFormation::CustomResource has *something else other than the actual zone ID* in that account. That causes the certificate request validation process to fail - thus the whole cdk deploy - since it cannot find the real zone to place the validation records in. If looking at the individual pending certificate requests in Certificate Manager page, they can be approved by manually pressing the [[Create records in Route 53]] button, which finds the correct zone to do so. Not sure where exactly CDK is finding this mysterious zone ID that does not belong to us? ``` "AppwebELBCertificatetestCertificateRequestorResource68D095F7": { "Type": "AWS::CloudFormation::CustomResource", "Properties": { "ServiceToken": { "Fn::GetAtt": [ "AppwebELBCertificatetestCertificateRequestorFunctionCFE32764", "Arn" ] }, "DomainName": "root.domain", "HostedZoneId": "NON-EXISTENT ZONE ID" }, "UpdateReplacePolicy": "Delete", "DeletionPolicy": "Delete", "Metadata": { "aws:cdk:path": "App-webELBStack-test/App-webELBCertificate-test/CertificateRequestorResource/Default" } } ```
1
answers
0
votes
10
views
asked 9 days ago

AWs trigger EventBatchingCondition/BatchWindow is not optional

Hi team, I have a glue workflow : trigger (type = "EVENT") => trigger a glue job (to take data from S3 and push them to MySQL RDS) I configured the glue Triggering criteria to kickoff the glue job after 5 events were received. in the console it says : > Specify the number of events received or maximum elapsed time before firing this trigger. > Time delay in seconds (optional) on AWS documentation it says also it's not required : ``` BatchWindow Window of time in seconds after which EventBridge event trigger fires. Window starts when first event is received. Type: Integer Valid Range: Minimum value of 1. Maximum value of 900. Required: No ``` So I want only my trigger to be triggered only and only after 5 events are received and not depending on: Time delay in seconds (optional). actually, the Time delay in seconds (optional) is set to 900 by default and my job is started after 900s even if there are no 5 events received. that's not the behaviour we want. We want ONLY the job to be started after x events are received. I tried via the console to edit the trigger and remove the 900s for the Time delay in seconds (optional) input but I can't save it until I put a value on it. it says it's optional but it doesn't seem to be. is there a workaround to make the trigger not take account of Time delay in seconds (optional)? and only be launched when it received x events and nothing else. right now the behaviour I have is that my job is triggered after 900s, we want to eliminate this case and let the job be triggered only and only if there is x event received and nothing else. how can I make the Time delay in seconds (optional) input optional, because now the console forces me to put a value in there? thank you.
1
answers
0
votes
7
views
asked a month ago

Fail to start an EC2 task on ECS

Hi there i am trying to start a task which uses gpu on my instance. EC2 is already added to a cluster but it failed to start, here is the error: ``` status: STOPPED (CannotStartContainerError: Error response from dae) Details Status reason CannotStartContainerError: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr Network bindings - not configured ``` ec2: setup ``` Type: AWS::EC2::Instance Properties: IamInstanceProfile: !Ref InstanceProfile ImageId: ami-0d5564ca7e0b414a9 InstanceType: g4dn.xlarge KeyName: tmp-key SubnetId: !Ref PrivateSubnetOne SecurityGroupIds: - !Ref ContainerSecurityGroup UserData: Fn::Base64: !Sub | #!/bin/bash echo ECS_CLUSTER=traffic-data-cluster >> /etc/ecs/ecs.config echo ECS_ENABLED_GPU_SUPPORT=true >> /etc/ecs/ecs.config ``` Dockerfile ``` FROM nvidia/cuda:11.6.0-base-ubuntu20.04 ENV NVIDIA_VISIBLE_DEVICES all ENV NVIDIA_DRIVER_CAPABILITIES compute,utility # RUN nvidia-smi RUN echo 'install pip packages' RUN apt-get update RUN apt-get install python3.8 -y RUN apt-get install python3-pip -y RUN ln -s /usr/bin/python3 /usr/bin/python RUN pip3 --version RUN python --version WORKDIR / COPY deployment/video-blurring/requirements.txt /requirements.txt RUN pip3 install --upgrade pip RUN pip3 install --user -r /requirements.txt ## Set up the requisite environment variables that will be passed during the build stage ARG SERVER_ID ARG SERVERLESS_STAGE ARG SERVERLESS_REGION ENV SERVER_ID=$SERVER_ID ENV SERVERLESS_STAGE=$SERVERLESS_STAGE ENV SERVERLESS_REGION=$SERVERLESS_REGION COPY config/env-vars . ## Sets up the entry point for running the bashrc which contains environment variable and ## trigger the python task handler COPY script/*.sh / RUN ["chmod", "+x", "./initialise_task.sh"] ## Copy the code to /var/runtime - following the AWS lambda convention ## Use ADD to preserve the underlying directory structure ADD src /var/runtime/ ENTRYPOINT ./initialise_task.sh ```
0
answers
0
votes
3
views
asked a month ago

Scheduled Action triggering at time specified in another action

I have a CloudFormation setup with Scheduled Actions to autoscale services based on times. There is one action that scales up to start the service, and another to scale down to turn it off. I also occasionally add an additional action to scale up if a service is needed at a different time on a particular day. I'm having an issue where my service is being scaled down instead of up when I specify this additional action. Looking at the console logs I get an event that looks like: ``` 16:00:00 -0400 Message: Successfully set min capacity to 0 and max capacity to 0 Cause: scheduled action name ScheduleScaling_action_1 was triggered ``` However the relevant part of the CloudFormation Template for the Scheduled Action with the name in the log has a different time, e.g.: ``` { "ScalableTargetAction": { "MaxCapacity": 0, "MinCapacity": 0 }, "Schedule": "cron(0 5 ? * 2-5 *)", "ScheduledActionName": "ScheduleScaling_action_1" } ``` What is odd is that the time this action is triggering matches exactly with the Schedule time for another action. E.g. ``` { "ScalableTargetAction": { "MaxCapacity": 1, "MinCapacity": 1 }, "Schedule": "cron(00 20 ? * 2-5 *)", "ScheduledActionName": "ScheduleScaling_action_2" } ``` I am using CDK to generate the CloudFormation template, which doesn't appear to allow me to specify a timezone. So my understanding is that the times here should be UTC. What could cause the scheduled action to trigger at the incorrect time like this?
1
answers
0
votes
6
views
asked a month ago

High-Traffic, Load-Balanced Wordpress Site - Optimal DevOps setup for deployment?

TLDR: I inherited a Wordpress site that I now manage that had a DevOps deployment pipeline that worked when the site was low to medium traffic, but now the site consistently gets high-traffic and I'm trying to improve the deployment pipeline. The site I inherited uses Lightsail instances and a Lightsail load balancer in conjunction with one RDS database instance and an S3 bucket for hosted media. When I inherited the site, the deployment pipeline from the old developer was: *Scale site down to one instance, make changes to that one instance, once changes are complete, clone that updated instance as many times as you need* This worked fine when the site mostly ran with only one instance except during peak traffic times. However, now at all times we have 3-5 instances as even our "off-peak" traffic is really high requiring multiple instances. I'd like to improve the deployment pipeline to allow for deployment during peak-traffic times without issues. I'm worried about updating multiples instances behind the load balancer one by one sequentially because we have Session Persistence disabled to allow for more evenly distributed load balancing. And I'm worried a user hopping to different instances that have a different functions.php file will cause issues. Should I just enable session persistence when I want to make updates and sequentially updates instances behind the load balancer one by one? Or is there a better suited solution? Should I move to a containers setup? I'm admittedly a novice with AWS so any help is greatly appreciated. Really just looking for general advice and am confident I can figure out how to implement a suggested best-practice solution. Thanks!
1
answers
0
votes
13
views
asked a month ago

Amplify export infrastructures does not work with CDK V2

According to [Amplify documentation](https://docs.amplify.aws/cli/usage/export-to-cdk/) and [this official blog post](https://aws.amazon.com/blogs/mobile/export-amplify-backends-to-cdk-and-use-with-existing-deployment-pipelines/), it is possible to export infrastructures from Amplify then import into CDK. However, I try with CDK V2 and it does not work. I got error when installing **npm i @aws-amplify/cdk-exported-backend@latest**. CDK V2 **Construct** is not compatible with the **Construct** in aws-amplify/cdk-exported-backend, I think. So how to export Amplify infrastructure to CDK V2? Thank you! 1. Here is my package.json of CDK ``` { "name": "amplify-export-cdk", "version": "0.1.0", "bin": { "amplify-export-cdk": "bin/amplify-export-cdk.js" }, "scripts": { "build": "tsc", "watch": "tsc -w", "test": "jest", "cdk": "cdk" }, "devDependencies": { "@types/jest": "^26.0.10", "@types/node": "10.17.27", "jest": "^26.4.2", "ts-jest": "^26.2.0", "aws-cdk": "2.18.0", "ts-node": "^9.0.0", "typescript": "~3.9.7" }, "dependencies": { "aws-cdk-lib": "2.18.0", "constructs": "^10.0.0", "source-map-support": "^0.5.16" } } ``` 2. Here is errors when installing ``` npm ERR! code ERESOLVE npm ERR! ERESOLVE unable to resolve dependency tree npm ERR! npm ERR! While resolving: amplify-export-cdk@0.1.0 npm ERR! Found: constructs@10.0.108 npm ERR! node_modules/constructs npm ERR! constructs@"^10.0.0" from the root project npm ERR! npm ERR! Could not resolve dependency: npm ERR! peer constructs@"^3.2.27" from @aws-amplify/cdk-exported-backend@0.0.5 npm ERR! node_modules/@aws-amplify/cdk-exported-backend npm ERR! @aws-amplify/cdk-exported-backend@"0.0.5" from the root project npm ERR! npm ERR! Fix the upstream dependency conflict, or retry npm ERR! this command with --force, or --legacy-peer-deps npm ERR! to accept an incorrect (and potentially broken) dependency resolution. ```
0
answers
1
votes
3
views
asked 2 months ago

Slow lambda responses when bigger load

Hi, Currently, I'm doing load testing using Gatling and I have one issue with my lambdas. I have two lambdas one is written in Java 8 and one is written in Python. I'm using Gatling for my load testing and I have a test where I'm doing one request with 120 concurrent users then I'm ramping them from 120 to 400 users in 1 minute, and then Gatling is doing requests with 400 constants users per second for 2 minutes. There is a weird behavior in these lambdas because the responses are very high. In the lambdas there is no logic, they are just returning a String. Here are some screenshots of Gatling reports: [Java Report][1] [Python Report][2] I can add that I did some tests when Lambda is warm-up and there is the same behaviour as well. I'm using API Gateway to run my lambdas. Do you have any idea why there is such a big response time? Sometimes I'm receiving an HTTP error that says: i.n.h.s.SslHandshakeTimeoutException: handshake timed out after 10000ms Here is also my Gatling simulation code: public class OneEndpointSimulation extends Simulation { HttpProtocolBuilder httpProtocol = http .baseUrl("url") // Here is the root for all relative URLs .acceptHeader("text/html,application/xhtml+xml,application/json,application/xml;q=0.9,*/*;q=0.8") // Here are the common headers .acceptEncodingHeader("gzip, deflate") .acceptLanguageHeader("en-US,en;q=0.5") .userAgentHeader("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0"); ScenarioBuilder scn = scenario("Scenario 1 Workload 2") .exec(http("Get all activities") .get("/dev")).pause(1); { setUp(scn.injectOpen( atOnceUsers(120), rampUsersPerSec(120).to(400).during(60), constantUsersPerSec(400).during(Duration.ofMinutes(1)) ).protocols(httpProtocol) ); } } I also checked logs and turned on the X-ray for API Gateway but there was nothing there. The average latency for these services was 14ms. What can be the reason for that slow Lambda responses? [1]: https://i.stack.imgur.com/sCx9M.png [2]: https://i.stack.imgur.com/SuHU0.png
0
answers
0
votes
7
views
asked 2 months ago

Lambda function updating cannot be made atomic with RevisionId

A number of Lambda API calls allow for a RevisionId argument to ensure that the operation only continues if the current revision of the Lambda function matches, very similar to an atomic Compare-And-Swap operation. However, this RevisionId appears to be useless for performing some atomic operations, for the following reason: Suppose I want to update a function's code and then publish it, in 2 separate steps (I know it can be done in 1 step, but this does not interest me, because I cannot set the description of a published version in 1 update/publish step...it must be done in 2 steps). The [update_function_code](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/lambda.html#Lambda.Client.update_function_code) call returns a RevisionId that corresponds to the "in progress" update of the function. This RevisionId cannot be used because it will change once the function becomes active/updated. This new RevisionId can only be obtained by [get_function](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/lambda.html#Lambda.Client.get_function). Update code -> RevisionId A (in progress) -> RevisionId B (updated/active) -> Get Function -> RevisionId B -> Publish Function There exists a race condition due to the fact that I must call `get_function` in order to get the current RevisionId before I continue with publishing my function. This race condition makes it impossible create an atomic sequence of operations that includes a `update_function_code` operation, because the RevisionId that it returns cannot be relied on, and has to be refreshed with a `get_function` call. Concurrently, another operation could change the RevisionId, and you wouldn't know, because you're depending on `get_function` to return an unknown RevisionId.
1
answers
0
votes
4
views
asked 2 months ago

Issue creating Lambda function layer versions in parallel process

Hi We are using Terraform (v0.13.5 with AWS provider "hashicorp/aws v3.38.0") to deploy AWS resources into our accounts. Some of these resources are Lambda functions with Lambda layers. We make use of certain automated processes (Gitlab pipelines) to run those deployments. We can change several Lambda functions at the same time. We use the same Lambda layer for ALL the Lambda functions, but create different versions of that layer with different code (ZIP files) and attach each version to a concrete Lambda function. Lately we realized that when modifying several Lambda functions at the same time, the code in the different versions of the same layer is mixed!!! So the code that should go for a concrete layer version is also appearing in other versions created at the same time. For example: * when we modify several Lambda functions at the same time (let's say L001 and L002), two new versions of layer MYLAY are created for each, and the corresponding version is linked to each of the modified Lambda functions. So we have L001 with MYLAY-001 and L002 with MYLAY-002. This is how we expect it, so fine so far * Each version of the layer should have its own code (different ZIP files) * We have detected that the code for MYLAY-001 is also appearing in MYLAY-002, even though the ZIP files used to create those versions are different!!! So from my point of view, it seems that the way in which **AWS is creating the layer versions for the same layer is not compatible with parallel creation**. Can anyone confirm of shed some light on how AWS is creating those versions? I guess the best approach considering the previous is to use different layer for each Lambda function, indeed. Thanks in advance and best regards Luis
1
answers
0
votes
3
views
asked 2 months ago
  • 1
  • 90 / page