AWS Real-Time Ad Tracker Architecture

0

Hello. I'm attempting to build an ad-tracking application that can attribute, store, and then query and analyze website visitor information in real or near real-time. Unfortunately, I'm finding difficulty designing the application architecture as I am new to AWS overall. So far, I expected my application to look like this:

  1. API Gateway to serve as a secure endpoint for websites and ad servers to send website visitor information (think utm parameters, device resolution, internal ID's etc)
  2. Lambda/Node.js to route and attribute session information
  3. DynamoDB for its ability to handle high-volume write rates in a cost-efficient way.
  4. S3 to create frequent/on-demand backups of DynamoDB which can then be analyzed by
  5. ? Considering passing all S3 data back for client-side processing in my dashboard.

However: I just found this case study with Nasdaq utilizing redshift and other services shown here. Judging from the 'Data' label featured in the first illustration of the latter link (clickstreams, transactions, etc) it appears to be exactly what I need.

So, I suppose my question would be from a cost, simplicity and efficiency standpoint: Would it just be easier to eliminate dynamodb and s3 and instead configure my lambda functions to send their data directly into redshift?

Any guidance would be greatly appreciated, thank you!

  • To be very specific to the question which you have asked the answer is Yes. If you don't have any specific cloud native services then you can eliminate these.

    You can load data into Amazon Redshift from a range of data sources including Amazon S3, Amazon RDS, Amazon DynamoDB, Amazon EMR, AWS Glue, AWS Data Pipeline and or any SSH-enabled host on Amazon EC2 or on-premises. Amazon Redshift attempts to load your data in parallel into each compute node to maximize the rate at which you can ingest data into your data warehouse cluster. Clients can connect to Amazon Redshift using ODBC or JDBC and issue 'insert' SQL commands to insert the data. Please note this is slower than using S3 or DynamoDB since those methods load data in parallel to each compute node while SQL insert statements load via the single leader node.

2 Answers
2

There are many ways to design this type of architecture and as you've seen some customers will do things differently. It totally depends on their comfort level with various technologies; requirements from an ingest and analytics perspective; as well as their budget.

Because there's no one "right" answer and this is a complex problem to solve I'd recommend that you reach out to a local AWS Solutions Architect to have a discussion as they can guide you and find the best solution for you.

In this case, the architecture that you have is fine and is very cost effective. but as above there are always other ways of doing it.

profile pictureAWS
EXPERT
answered a year ago
  • Thank you! Are you available to have a discussion?

  • Typically, yes. But there's no way on re:Post to (securely) exchange identities and details; and there's no way of telling if we're even close to being in the same timezone. I'd recommend that you contact your local AWS office - wherever "local" is.

  • I'm in the United States, Eastern Standard Time. I'm also not sensitive about sharing my own personal contact info here for the time being. I'll delete this later, but should you decide to reach out my email is ep@pcom.global.

1

Brett nailed this. :) I honestly like the Api Gateway -> Lambda -> Kinesis Data Stream -> Firehose -> OpenSearch -> Grafana(whatever dashboard tool you like). You have very little code to write with this stack. If you need to do some aggregation work, you can add KDA into the mix with the existing KDS as your source and another KDS as your sink. But like Brett said, so many options.

profile picture
answered a year ago
  • For the purposes of ad tracking (which does not always pass full parameters all of the time), it seems like a semi-structured database like Redshift would be the way to go, however, I'm concerned about whether or not making 50,000 INSERT requests per day is... well... smart.

  • 50,000 inserts a day isn't a lot - that's less than one per second.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions