How to offload historical data to Redshift or S3?
A customer has a MySQL database and want to offload historical data to another platform for reporting and analysis. Currently they are facing performance issues during normal operations.
What do I need to take into account to make the decision? I have seen that a possible way is to use DMS https://aws.amazon.com/blogs/database/archiving-data-from-relational-databases-to-amazon-glacier-via-aws-dms/. I wonder if Data Pipeline can also be used here. The destination is going to be s3 or Redshift depending on the data. Thanks!
DMS and Kinesis data firehose can stream data changes from MySQL onto Redshift and S3. This is common pattern for transactional sources with primary key in source tables. Here is a blog that describes how to load ongoing changes from source using Glue and DMS https://aws.amazon.com/blogs/big-data/loading-ongoing-data-lake-changes-with-aws-dms-and-aws-glue/ .
For historical data load into S3 and Redshift use DMS for low to moderate amount of data given that the customer have significant network bandwidth to AWS. For significant size data such as 10's TB exporting the MySQL data into raw files and move to AWS Snowball and import to S3 can be more reasonable and time efficient.
If the customer wants to have seamless ETL experience with sourcing capability from MySQL into Redshift and S3 they can use 3rd party product like Snaplogic. Here is a blog https://aws.amazon.com/blogs/apn/migrating-data-warehouse-workloads-from-on-premises-databases-to-amazon-redshift-with-snaplogic/
Relevant questions
Redshift questions from AWS Customer
Accepted Answerasked 3 years agoTimestream Historical Data (10s of years old)
asked 3 months agoHow is data returned from Spectrum to Redshift cluster?
Accepted Answerasked 2 years agouse Redshift Spectrum to query both Redshift table and s3 file
Accepted Answerasked 2 years agoIncremental Data Capture from DynamoDB to S3
Accepted Answerasked 2 years agoHow to offload historical data to Redshift or S3?
Accepted Answerasked 2 years agoRDS to S3 to RDS or Redshift near realtime
Accepted Answerasked a year agoRedshift as as Data Source for a REST API for Analytics Query
Accepted Answerasked 3 years agoI need to read S3 data, transform and put into Data Catalog. Should I be using a Crawler?
Accepted Answerasked 3 months agoRedshift Managed Storage vs Spectrum
Accepted Answerasked 2 years ago