Deploying application on AWS ECS Cluster

0

I developed and hosted locally a Spring Boot application with PostgresDB and Angular UI.

I'd like to deploy this application on AWS ECS. Should I create an image for Spring Boot backend, an image for Angular UI, and define them in the container definition?

What about DB? Should I create an RDS Postgres DB in AWS? How does my application talk to RDS? Any help would be greatly appreciated. Thank you.

4 Answers
1

Yes, this is a good practice to split backend and frontend into different containers. As for DB - you can use RDS or PostgreSQL-compatible Aurora - Aurora usually is a better choice unless you have particular reasons to not use it. As for communication between app backend and DB - usually configuration for connection is exposed through environment variables.

So basically you launch your DB, you get it's host, username, password and add them as env variables to ECS container definition. Since credentials is sensitive information it might be a good idea to store them (or at least password) in Secrets Manager/Systems Manager Parameter Store and use value from them in container definition.

answered 2 years ago
  • I use the Spring Boot properties file to connect to the DB locally. Does it have to match with container environment variables? Thanks.

  • With properties file it might be a bit challenging because of securit concerns. It's not a good idea to have the same values (username and password) on local and production environments, so at least you can use different properties files per environment. But then, you need to store production properties file somewhere and since it contains DB credentials it should be stored securely . And it makes challenging usage of properties file for DB credentials. So I'd really recommend to rewrite your app a bit to use environment variables for credentials instead of properties file.

  • While I agree that Aurora can handle higher concurrent workload and has very useful features (global databases, consistent low replica latency, etc.), I respectfully disagree with the blanket statement that it's a better starting point over standard RDS PG/MySQL. Mainly because of the higher per-instance hour cost and the additional per-IO read/write cost (which seems to catch folks off guard). I suggest starting with regular RDS PG/MySQL unless you know you need the extra capability of Aurora. It's easy to upgrade to RDS PG/MySQL down the road if you need it.

1

Hello. To be fair to all that is out there, you can have a look into AWS AppRunner / AWS Copilot that will deal with a fair amount of configuring out the ECS Settings and resources needed to deploy your application.

I would imagine that you'd gone down the path of using docker-compose locally to build your application with a service for each thing : your frontend/backend/DB nicely defined in compose format following the hundreds of guides out there to do just that. If you have not done that, I'd highly recommend to look into it to speed up your development and local testing.

To take to the cloud, if you look at AppRunner / Copilot, you then have to rewrite all that to fit within what AWS specific format for it. To save yourself that effort, you can then try out ECS Compose-X.

This software will parse your services defined in compose format, and generate all the ECS Task/Execution IAM roles, CPU/RAM configuration (see this) and the ECS Task and Service definition. On top of that, to then move your DB into AWS RDS, you can then use Compose extension field for it that compose-x will understand and deal with for you (see docs). This will allow you to either define all the settings you want using the AWS CloudFormation properties for RDS (i.e. for a DBCluster). But not only: you can then map to that DB resource which service(s) should have access to it: this will then automatically create Security Group ingress between the services and the DB, create IAM access to retrieve the DB Password etc.

If you do not have an existing infrastructure (VPC/ECS Cluster etc.) compose-x will take care of creating all of that for you, following best practices. You can tweak the configuration by setting the respective x-<field> of the resources you want to have. If you did have an existing infrastructure, you can then have compose-x lookup these resources in your account based on tags and/or resources ARN. It will then automatically pickup the right value and set the parameters accordingly.

You can find examples in the labs and feel free to ask question on the Slack channel about it.

Hope this helps!

profile picture
answered 2 years ago
  • Thanks for your explanation. It's not EKS. What is the big deal to configure ECS? Create cluster, TD/Containers & Service in a few clicks.

  • One of the problems with AppRunner is that the service you deploy runs outside of your VPC and currently has no way to get private connectivity back to your VPC... the app would be unable to access your RDS database directly. I don't think you can give AppRunner a static IP, and thus couldn't safely rely on the RDS security group rules to limit traffic. Even if you could, public RDS scares me.

    While I haven't used Copilot, it does sound like a good shortcut to configure ECS... but for someone learning like this, I still suggest configuring ECS "from scratch" to learn the underlying mechanics

1

So there are 3 main components for your applications:

  1. Angular UI
  2. Spring Boot Backend
  3. PostgreSQL DB.

Angular UI

Angular UI is mostly static files, they can be in an S3 bucket which a Cloudfront CDN, this is what I usually use. Since s3 + cloudfront is very simple plug and play solution it will be very less effort as compared to creating the container image and deploying on ECS.

Spring Boot Backend

For Spring boot backend you will create container images and ideally push them to ECR. From their your task definitions will pull the images and run them. The containers usually serve traffic via an AWS ALB so you have to keep that in mind while designing your infrastructure, specially VPC.

PostgreSQL Database

The PostgreSQL database should be deployed using AWS Aurora because of the performance boost it provides, also push button compute scaling and storage auto-scaling are my favorite features of AWS Aurora. In case if you have specific version requirements for PostgreSQL you can check AWS RDS as well.

The question that how my application will talk to RDS?, usually when you deploy VPC you create public, private and db subnets. The database will be deployed in db subnet and will only allow communication from private subnet and from spring boot backends. When you deploy the Aurora or RDS based DB, you get a hostname and other DB credentials. You can utilize AWS Secrets manager to auto rotate the DB credentials which is the recommended practice for security. Please check the below article to see how you can integrate your application with AWS Secrets manager. https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/manage-credentials-using-aws-secrets-manager.html

Feel free to ask if you need more clarification on anything.

answered 2 years ago
  • +1 on hameedullah's comment about exploring S3+CloudFront for static HTML/JS frontends.

    Now, an S3-hosted frontend wouldn't have VPC connectivity to the springboot backend, which may or may not be ok. If this is a typical 3-tier where only the frontend should talk to backend, things get more complicated. May need API GW in-front of backend and some sort of auth mechanism for the frontend. Not a bad thing, just saying it starts to complicate things and the "right" answer might depend on circumstances.

1

Images & Container Definitions

Is it correct to say that your Spring Boot backend and Angular UI are running as two separate processes, e.g. mvn spring-boot:run for the backend and something like ng serve or npm start for a NodeJS HTTP server w/ Angular? (I don't know Java/Spring Boot, so just want to double-check).

If so, then yes, you would build a separate Docker image for each process and push to Amazon ECR (or other supported container repo), and you would reference these images in your ECS task definitions.

In a production environment, you would often use separate, single-container Task Definitions for the two images, with each Task Definition mapped to its own auto-scaling ECS service. This has several benefits, such as allowing the backend to scale independently from the frontend.

However, it sounds like you're newer to ECS - and if this is just for learning or early-stage dev/test, my personal suggestion would be to first start with a single Task Definition running both containers side by side. It will simplify some of the additional networking setup needed for service-to-service communication. If you do want to dive into separate services, read up on "ECS Service Discovery" (here's an example workshop).

Database Questions

Should I create an RDS Postgres DB in AWS?

So first, yes, you likely want to use RDS if you need a relational database - its a big time saver over trying to set one up yourself on say EC2.

As for RDS Postgres specifically, do you have a preference for, or does your app require, Postgres? RDS can run MySQL, MariaDB, and others... and as long as your app is flexible, you can use the engine of your choice.

Let's say you decide to use Postgres or MySQL. You can run these two ways on RDS... first, you have "RDS Postgres" and "RDS MySQL". These are standard-issue open source engines that AWS manages for you; like EC2, you pay a per-hour cost for the VM, a per-GB-hour cost for EBS disk size, and potentially some ancillary costs depending on the features you use (enhanced logs, multi-AZ, etc.).

Now, if you are doing this as part of a learning exercise and don't expect to drive much traffic to the database and/or it may go many hours without activity while not in use, I would instead suggest you look at RDS Aurora Serverless, which is "Postgres compatible" and "MySQL compatible" option and, to you and your application, looks just like the open source versions (in certain cases, there might be code change needed, but that's typically edge cases). Both instance-based Aurora and Aurora Serverless come with many performance enhancements and AWS-unique features that can help when you need to scale beyond what the normal open source versions can handle. But I specifically recommend Aurora Serverless for learning/small traffic because the serverless version, the database engine can automatically turn off when not in use so that you do not pay unnecessary hourly instance cost. Note, I would not use the regular Aurora instance-based option (non-serverless) for low-traffic dev/test, because that version is more expensive per hour that regular RDS Postgres/MySQL and you don't need the extra performance.

How does my application talk to RDS?

Both ECS tasks and RDS databases run in a VPC & subnet(s) of your choosing. Anything running in a VPC will always be given an Elastic Network Interface (ENI), aka virtual network card, and an ENI will always have a private IP address coming from your subnet CIDR.

If running in the same VPC, then two ENIs, e.g. from an ECS task and an RDS database, typically can communicate with one-another as long as their attached security groups allow the necessary traffic. I say "typically" because there are lots of ways to configure your VPC (e.g. Subnet ACLs, custom DNS, AWS firewall rules, etc.) and for simplicity, I'm assuming you're not doing anything unique here.

However, with AWS-managed services like RDS, the IP address is assigned randomly from your subnet CIDR and there are certain cases (like a hardware failure where AWS restarts the database on a new host) where the IP address might change. This means you do not want to hardcode the RDS database IP in your container code or ECS task definition environment variables. Instead, AWS provides each RDS database with a DNS name that stays constant, and if the database IP(s) change, RDS will automatically update the DNS entry with the new IPs. If your application code uses this DNS, you do not have to worry about IPs changing. The DNS names will look like myexampledb.a1b2c3d4wxyz.us-west-2.rds.amazonaws.com, and you can read more about them here.

How do you give the RDS DNS name to your container?

There are several acceptable ways to do this... but one that you should avoid is hard-coding the DNS name in your application code or Dockerfile because, if you ever re-create the database or need to switch between databases (say in dev vs prod), you would have to rebuild your image or maintain multiple images.

Instead, there are two basic patterns that are much better to use:

  1. Inject the DNS name into your container environment variables when your ECS task starts - as part of your ECS Task Definition's optional Environment Variables. For example, if you provide the RDS DNS name as an environment variable named DATABASE_ENDPOINT in the task definition, you could reference its value in your Springboot code with something like System.getenv("DATABASE_ENDPOINT"). Note that this is not a good way to provide secrets (like your database password, because the secrets are stored and visible in plaintext.

  2. Have your container read the DNS name from a central parameter store - sometimes you might have multiple apps, AWS services, etc. that need to reference the same value. Rather than putting that value in multiple places, you could put it in one, central place and dynamically have your app code (or certain supported AWS services) read the value at runtime and store it in memory. This is also the recommended way to retrieve secrets (like database passwords), so that you do not have to hardcode them in your app code or ECS Task Definition. ECS has out of the box support for retrieving encrypted secrets from AWS Parameter Store, and I'd recommend this approach for your database password. The nice thing with ECS's Parameter Store integration is that access to Parameter Store is based on IAM... and you would just grant the ECS Task's role IAM permission to retrieve the value.

To summarize, you should use ECS Task Definition environment variables to insert non-secret values like your database endpoint, port, etc. If this is just for learning and there's no sensitive data in your database, you could also use env vars for the password... but if building for production or storing sensitive data, you should dynamically retrieve these values at runtime using a method like above.

mwrb
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions