AWS migration from local data warehouse


I work at a small business where our data warehouse is built locally and synced using Dropbox for shared access.

Current architecture:

  1. Data Collection: Python & R scripts to collect data from various marketing and sales channels. For context, we use social media APIs like (tiktok business API, facebook graph API) to collect ads data. Amazon Ads and Selling Partner APIs to collect amazon data and various other API integrations to get the data on our local system and store in a .csv format (flat file)
  2. Data Aggregation & Transformation: Once the source data has been we transform the data as needed
  3. Combine relevant data from all source in master csv files. These files combine data from individual data sources (TikTok, Meta, Amazon, Ecommerce Data etc. ). For example, one file will contain ads data aggregated from all social platforms. We have hundreds of these master files that we later use for reporting in Excel.

Our data warehouse consists only of csv flat files.

Now I want to migrate our entire architecture in AWS, may be like creating lambda functions for data collection on a daily basis. Store in a much more robust format may be like a database. Just wondering how would our solution be built in AWS.

Thanks in advance for your advice.

1 Answer
Accepted Answer

Many possibilities, for example this one:

The data collection may be done using AWS Lambda functions indeed (perhaps using step functions for orchestration), and AWS Glue for transformation. Keep in mind the limitations of the AWS Lambda though (like 15 minutes maximum run time).

The resulting data doesn't absolutely have to be in a database - what's frequently done is storing in S3 instead as structured data, for example as Parquet files (Glue can also be used to do that by the way). This will be cheaper than a database and very durable. Once on S3, the data can be queried using Athena or other mechanisms, both AWS (Quicksight) or external.


answered 14 days ago
  • Thanks! I will test this out on a smaller scale. Although some of the scripts do need more than 15 mins of execution time, but it wouldn't be roadblock atleast in the initial stages.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions