What is the best way to work with databases in local environments?

0

We currently work with Aurora RDS postgres. Initially our database was small and easy to manage, we sent a dump to S3 and made it available to developers to develop new features and solve bugs.

For today our database weighs 700GB, what are the best strategies to make the data available to developers? It is impossible to download the entire database, it would be very costly in time. The most important thing is that they have data to develop new features and solve bugs

2 réponses
2
Réponse acceptée

First off, consider whether having a single copy which all of your developers connect to would meet your needs. They would connect remotely, and it would be a copy of your production data. So it's great because it's easy to refresh it regularly and it's guaranteed to have all of your production use cases covered. It has many advantages.

But that may not meet your needs. You said that you wanted local databases. In that case I think you have two main paths to consider.

  1. Generate sample data
  2. Create a process to extract a subset of your production data

Generating sample data will probably take more effort in the beginning. But it has some nice advantages. It's very easy to ensure that you generate the data that you need. It would be parameterized, so each developer generates the data he/she cares about at that point in time. There are no network issues downloading large data sets.

But if you really need to extract a portion of the main database, then you need to think of it as an Extract-Transform-Load (ETL) project. Use a Data Integration (DI/ETL) tool to connect to the main database and extract some subset. Ideally the subset will be easily defined. Maybe for most tables you simply take the latest 2 months of data, and for other tables (like reference tables) you take the entire table. It would be significant effort to define all of the individual mappings... but really it wouldn't be complex. You could decide on details like loading the data into another database or saving into CSV files. Then make the database dump or CSV files available to your developers. As a developer, you might be inclined to write your own scripts to perform this job. Of course that's possible. But there are so many good ETL tools available (including free ones) that I would argue strongly in favor of using a tool to write these jobs.

répondu il y a 2 ans
AWS
EXPERT
Hernito
vérifié il y a 2 ans
EXPERT
John_F
vérifié il y a 2 ans
1

Using Aurora Cloning is a common way customers create full database copies and make them available to their developers; quickly and with minimal additional cost. These clones can be re-established on a recurring basis (e.g. weekly) Check out this blog for more details: https://aws.amazon.com/blogs/aws/amazon-aurora-fast-database-cloning/

AWS
Greg_G
répondu il y a 2 ans
AWS
EXPERT
Hernito
vérifié il y a 2 ans
  • I need isolated databases for each developer, RDS would fall short with its limit of 40 instances, apart from that it would be too expensive.

    Is there a tool that could bring a part of the database, which is consistent between the tables?

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions