Request Loss due to health interval checks & Concerns regarding Service Connect

0

These are some very common questions i think which anyone will encounter while they are trying to deploy their application on ecs securely! And as a beginner I am also in this group!

ALB decides to send traffic only to the healthy targets depending on the health checks ! But health checks are done in a interval! But what about the req that will be forwarded to the unhealthy target? I mean if the checker checked just for now that the target is healthy and then the next second my container or task crashed then this 4-5 sec traffic will be routes by ALB to that faulty target? Because in large scale system per second 1000s of requests are triggered!

If we setup self signed signed certificates and used them in our Application container then will the ALB be able to setup https connection with my broker api service (task) automatically. Same from one service to service communication inside the private subnet? And does using Service connect on top of that will work for the https connection because it setups a proxy with each task and i am setting my application only to use its own certificates so will the proxy be able to figure out that https connection?

Besides does assingning ALB and target group to ecs auto assigns or registers new scaled targets of auto scaling in the ALB target group?

Another question is for setting up self signed certificates on my application generating the certificates using open ssl for each task on their startup and attaching a timer which will stop the server automatically before the certificate expeires and auto scaling group will spin up a new task with new certificates is a good approach or not?

And for auto scaling my service task in a ecs service which is in private subnet do i need to establish a cloud watch endpoint so that it can send the data and alarm triggers to cloudwatch?? if that's the case which endpoint should i use?

Please answer this question in respect of ecs faragate launch type! Thanks in advance!

1 Answer
1
Accepted Answer

Hi Rahat,

Healthchecks: Yes, this is an inherent tradeoff with healthchecks in any system design. The shorter the interval, the faster you can detect failures; but the more healthcheck load you're placing on your system. Generally healthcheck failures should be rare in a well designed system where updates are deployed to QA environments before prod, and so this shouldn't be an issue often. When it does happen, you should have a plan in place to make the experience as seemless for the end user as possible. For example, custom http error replies.

HTTPS: There are 2 separate HTTP connections when using an ALB

  1. Client to the ALB: This one needs a valid, signed certificate, to avoid your customers seeing warnings. You can easily provision one to use with the ALB via ACM
  2. ALB to targets (tasks): This connection can use a self-signed certificate. However, since this request is staying within the VPC, many environements are setup for this connection to be done over HTTP, and not HTTPS (unless there is a legal/compliance or similar policy requiring end-to-end HTTPS). This removes the load and latency from your tasks of having to handle the TLS connection

AutoScaling: Yes, if the ALBs target group is listed on the ECS Service configuration, all newly launched tasks will automatically be registered with the target group; and tasks being terminated will be deregistered

OpenSSL: I don't have any direct experience with setting this up in prod, so not going to speculate too much here. But if you end up not setting up end-to-end HTTPS as mentioned above, this becomes a moot point

Metrics in private subnets: No, the ECS service itself is sending the metrics default ECS metrics to CloudWatch, not your task. So the task does not need connectivity to CloudWatch (unless you're trying to push custom metrics, in which case yes, you do need connectivity from the Task to CloudWatch via a VPC Endpoint, NAT Gateway, etc). Similarly, the alarm is triggering AutoScaling, which is sending an UpdateService API call to the ECS service itself, not your tasks. So no connectivity concerns for the alarm either

Fargate: Just a quick terminology note, you mentioned "AutoScaling Group" further up in the question. An ASG is part of the EC2 AutoScaling service, and is used to autoscale EC2 instances. For an ECS Service, you use an Application AutoScaling Scalable Target

AWS
answered 4 months ago
profile picture
EXPERT
reviewed 4 months ago
  • Nice! But you didn't tell anything regarding service connect! Does service connect works on top for serving https! if we decide to encrypt our traffic from ALB to our task using self signed certificates! Because i was just thinking to securely pass the traffic from the ALB which is in public subnet to the broker service in private subnet and then rely on http for grpc communication between internal services! (because i don't want any case where a hacker get through the internet gateway then snicking out the decrypted outbound traffic of my ALB!)

  • I've never used Service Connect, so not sure on that part. From a quick review of the main Service Connect doc, it looks like this is basically just a service discovery feature using DNS, so I would assume HTTP vs HTTPS wouldn't make a difference to it.

    But in general, if someone has hacked into your environment, it would generally have to be one of your instances/tasks, and the traffic is being decrypted on those for the application to read anyway.

  • Thanks for replying! Not a big deal I also hope the same that it should not make a difference ! Soon I am going to try that thing!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions