What is the best way to work with databases in local environments?

0

We currently work with Aurora RDS postgres. Initially our database was small and easy to manage, we sent a dump to S3 and made it available to developers to develop new features and solve bugs.

For today our database weighs 700GB, what are the best strategies to make the data available to developers? It is impossible to download the entire database, it would be very costly in time. The most important thing is that they have data to develop new features and solve bugs

2 Answers
2
Accepted Answer

First off, consider whether having a single copy which all of your developers connect to would meet your needs. They would connect remotely, and it would be a copy of your production data. So it's great because it's easy to refresh it regularly and it's guaranteed to have all of your production use cases covered. It has many advantages.

But that may not meet your needs. You said that you wanted local databases. In that case I think you have two main paths to consider.

  1. Generate sample data
  2. Create a process to extract a subset of your production data

Generating sample data will probably take more effort in the beginning. But it has some nice advantages. It's very easy to ensure that you generate the data that you need. It would be parameterized, so each developer generates the data he/she cares about at that point in time. There are no network issues downloading large data sets.

But if you really need to extract a portion of the main database, then you need to think of it as an Extract-Transform-Load (ETL) project. Use a Data Integration (DI/ETL) tool to connect to the main database and extract some subset. Ideally the subset will be easily defined. Maybe for most tables you simply take the latest 2 months of data, and for other tables (like reference tables) you take the entire table. It would be significant effort to define all of the individual mappings... but really it wouldn't be complex. You could decide on details like loading the data into another database or saving into CSV files. Then make the database dump or CSV files available to your developers. As a developer, you might be inclined to write your own scripts to perform this job. Of course that's possible. But there are so many good ETL tools available (including free ones) that I would argue strongly in favor of using a tool to write these jobs.

answered 2 years ago
AWS
EXPERT
Hernito
reviewed 2 years ago
EXPERT
John_F
reviewed 2 years ago
1

Using Aurora Cloning is a common way customers create full database copies and make them available to their developers; quickly and with minimal additional cost. These clones can be re-established on a recurring basis (e.g. weekly) Check out this blog for more details: https://aws.amazon.com/blogs/aws/amazon-aurora-fast-database-cloning/

AWS
Greg_G
answered 2 years ago
AWS
EXPERT
Hernito
reviewed 2 years ago
  • I need isolated databases for each developer, RDS would fall short with its limit of 40 instances, apart from that it would be too expensive.

    Is there a tool that could bring a part of the database, which is consistent between the tables?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions