How to Design Scalable and High-Performance Architecture?

0

Hello, I have a use case described below in detail, need your suggestions for scalable and high-performance AWS architecture. I have shared my thoughts as well regarding the problem statements and need more expert opinions. TIA

Problem statements:

  1. To scale millions of sales opportunities, I have a sales opportunities database (Postgres in AWS) with millions of customers, and the sales opportunities scoring model will work across several customer profiles that are being processed. The interpretation endpoint should be able to dynamically rank millions of customers and make that data available. So the client from the front end can query and apply a filter to pull sales opportunities that are over a certain score. For example, customers with a score of greater than 80.
  2. What should be the architecture of the solution in AWS?
  3. How should be the solution GUI with the JavaScript UI APIs that efficiently interact with AWS?
  4. How I can respond from the front-end application when sales opportunity quality is not good based on the customer feedback and have to adjust the sales opportunities scoring model for improvement?

Following are my thoughts regarding the use case (problem statements) described above:

  1. Database sharding in AWS with PostgreSQL can be implemented using various techniques, in this case could be range-based (splitting data by score). Set up multiple Amazon RDS (Relational Database Service) instances as shard nodes. Each instance will represent a shard of the database. AWS Auto Scaling can help adjust the capacity of RDS instances based on demand. To migrate existing data into shards we can use Amazon Database Migration Service (DMS).
  2. Implement a routing layer that sits between my application and the database shards. This layer is responsible for routing queries to the appropriate shard. We can use Amazon API Gateway, to build this routing layer and it will act as the load balancer. We can build routing logic by hashing or parsing query parameters to identify the correct shard. Here we should care about whether the queries that involve multiple shards (e.g., joins across shards) are handled efficiently.
  3. We can create a distributed caching layer (e.g., Amazon ElastiCache) to improve query performance across shards and cache frequent query data.
  4. We can use AWS Lambda for handling dynamic sales opportunities scoring and making the data available for front-end queries.
  5. To interface with JavaScript UI, expose sales opportunities API(s) as RESTful endpoints. For this purpose, we can use AWS SageMaker Endpoint.
  6. Analyze the feedback to identify areas for sales opportunities scoring model improvement. Make adjustments to the model based on feedback and retrain machine learning models if applicable. To achieve this (CI/CD) we can use AWS CodePipeline and AWS Codebuild services.

Your suggestions: Please advise what is missing or using unnecessary tools and how we can design more scalable, high-performance, and cost-effective architecture.

It would be great if you share or have some references for the Architecture diagram.

Thanks,

  • Are you certain your data needs to be partitioned? Millions of records shouldn't be an issue for a database in general. Throughput sounds like your bigger concern, which can be managed via read replicas. A few other notes: APIG is not a load balancer, nor can it integrated directly with RDS for query ops. SageMaker is for inference, not for data query ops. I think you'd be better off using something like APIG->Lambda->RDS-RR. For ElastiCache, you'd need to think about what your cache key would be, which is hard to do with highly customizable query ranges. Food for thought.

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions