Redshift Database Migration

2 分的閱讀內容

Migrate a single database within a Redshift cluster to another cluster via Datasharing


Amazon Redshift supports multiple databases in a single deployment and each database can have multiple schemas. Each schema can then have multiple tables, views, stored procedures, or other database objects.

Enter image description here

In this example, "redshift-cluster-2" has a database named dev with schemas admin, blanks, and dev_schema_1.

A use case for multiple databases is for multiple tenants where you want to keep each tenant’s data separate from one another. Another use case could be having multiple development databases in one single Redshift deployment.

Eventually, you may want to separate databases to different deployments. This could be to better track costs and deploy a small Redshift Serverless workgroup for each development group. This can be achieved by restoring a snapshot to a new Redshift Cluster or Redshift Serverless Workgroup and then simply removing the databases you don't need.


But what about consolidating databases? What if you want to merge two different databases to a single Redshift deployment? How can you move just a single database from one Redshift deployment (Cluster or Serverless) to another?


One solution is to use AWS Schema Conversion Tool to migrate the objects from one Redshift to another. This will unload the data in question to S3 and then load the data using COPY. This focuses on moving data only.

Another solution is to use a new utility developed in collaboration with a customer that utilizes Amazon Redshift Data Sharing. This solution makes the "Producer" (source) data available on the "Consumer" (target) to be ingested.

The solution does the following:

  1. Creates users and groups missing on the target. It also assigns the users to the groups based on the source permissions.
  2. Creates schemas, tables (with primary and foreign keys), functions, procedures, views, and materialized views.
  3. Migrates the permissions for schemas, tables, functions, procedures, views, and materialized views. This includes ownership, user grants, group grants, and default privileges.
  4. It configures Data Sharing.
  5. Loads data in parallel using Data Sharing.

The README in this GitHub repo walks you through the setup and use of the tool.

Enter image description here

profile picture
已發佈 1 個月前檢視次數 121 次