What is the best way to work with databases in local environments?

0

We currently work with Aurora RDS postgres. Initially our database was small and easy to manage, we sent a dump to S3 and made it available to developers to develop new features and solve bugs.

For today our database weighs 700GB, what are the best strategies to make the data available to developers? It is impossible to download the entire database, it would be very costly in time. The most important thing is that they have data to develop new features and solve bugs

2개 답변
2
수락된 답변

First off, consider whether having a single copy which all of your developers connect to would meet your needs. They would connect remotely, and it would be a copy of your production data. So it's great because it's easy to refresh it regularly and it's guaranteed to have all of your production use cases covered. It has many advantages.

But that may not meet your needs. You said that you wanted local databases. In that case I think you have two main paths to consider.

  1. Generate sample data
  2. Create a process to extract a subset of your production data

Generating sample data will probably take more effort in the beginning. But it has some nice advantages. It's very easy to ensure that you generate the data that you need. It would be parameterized, so each developer generates the data he/she cares about at that point in time. There are no network issues downloading large data sets.

But if you really need to extract a portion of the main database, then you need to think of it as an Extract-Transform-Load (ETL) project. Use a Data Integration (DI/ETL) tool to connect to the main database and extract some subset. Ideally the subset will be easily defined. Maybe for most tables you simply take the latest 2 months of data, and for other tables (like reference tables) you take the entire table. It would be significant effort to define all of the individual mappings... but really it wouldn't be complex. You could decide on details like loading the data into another database or saving into CSV files. Then make the database dump or CSV files available to your developers. As a developer, you might be inclined to write your own scripts to perform this job. Of course that's possible. But there are so many good ETL tools available (including free ones) that I would argue strongly in favor of using a tool to write these jobs.

답변함 2년 전
AWS
전문가
Hernito
검토됨 2년 전
전문가
John_F
검토됨 2년 전
1

Using Aurora Cloning is a common way customers create full database copies and make them available to their developers; quickly and with minimal additional cost. These clones can be re-established on a recurring basis (e.g. weekly) Check out this blog for more details: https://aws.amazon.com/blogs/aws/amazon-aurora-fast-database-cloning/

AWS
Greg_G
답변함 2년 전
AWS
전문가
Hernito
검토됨 2년 전
  • I need isolated databases for each developer, RDS would fall short with its limit of 40 instances, apart from that it would be too expensive.

    Is there a tool that could bring a part of the database, which is consistent between the tables?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠