Complete a 3 Question Survey and Earn a re:Post Badge
Help improve AWS Support Official channel in re:Post and share your experience - complete a quick three-question survey to earn a re:Post badge!
Implement Dynamic Data Masking with Amazon Redshift Data Sharing
This article provides a guide on implementing Dynamic Data Masking (DDM) in conjunction with Amazon Redshift Data Sharing.
Introduction
Data sharing in Amazon Redshift allows you to securely share live data across clusters & endpoints without copying or moving the underlying data. This powerful feature enables cross-team and cross-organisation collaboration, but when sharing sensitive data, you may want to limit which fields are available to the data consumers.
Dynamic Data Masking (DDM) in Amazon Redshift helps you protect sensitive data by manipulating how it appears to users at query time, without altering the underlying data. When combined with data sharing, DDM provides a flexible solution for sharing data while maintaining appropriate controls on sensitive information.
In this article, we'll walk through how to implement DDM in conjunction with data sharing to provide secure, role based access to shared data in Redshift.
💡 Note that the process we follow in this article also works with row-level security in Amazon Redshift
Scenario
Consider a fictitious financial services company, AnyBank, that maintains a central customer data warehouse in Redshift. This warehouse contains sensitive personally identifiable information (PII) such as customer names, email addresses, phone numbers, and social security numbers.
AnyBank needs to share this data across multiple departments in accordance with the following rules:
- Customer service representatives need access to names and contact information but shouldn't see full SSNs
- Data analysts need to work with the data but should see masked PII
- Compliance officers need full access to all customer information except account balances
Using dynamic data masking with data sharing, AnyBank can create a centralised data source that enforces appropriate access controls based on user roles when the data is queried, without creating and maintaining multiple copies of the data.
Prerequisites
To work through the example scenario in this article, you will need:
- An Amazon Redshift producer cluster or Serverless endpoint (patch P188+)
- An Amazon Redshift consumer cluster or Serverless endpoint (patch P188+)
- A database user with appropriate permissions on both endpoints
- Redshift Query Editor v2 or another SQL client to connect to both endpoints
Dynamic Data Masking and Data Sharing
Before diving into the implementation, let's understand how dynamic data masking works with Redshift data sharing:
- Dynamic Data Masking (DDM) allows you to define policies that transform sensitive data at query time, without modifying the underlying data.
- Data Sharing allows you to share live data across different Redshift clusters & endpoints without copying the data.
- When using both together, DDM policies on the producer endpoint do not automatically apply to consumer queries. Instead, DDM policies should be independently configured on consumer endpoint.
- A common pattern is to create late binding views on the consumer and apply DDM policies to those views.
Implementation steps
Step 1: Create table and sample data on producer Redshift endpoint
First, connect to your producer Redshift endpoint and create a sample customer table:
CREATE SCHEMA anybank;
CREATE TABLE anybank.customers (
customer_id INT,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
phone VARCHAR(20),
ssn VARCHAR(11),
account_balance DECIMAL(12,2),
create_date TIMESTAMP
);
-- Insert sample data
INSERT INTO anybank.customers VALUES
(1001, 'John', 'Smith', 'john.smith@example.com', '555-123-4567', '123-45-6789', 54321.98, '2022-01-15 09:34:21'),
(1002, 'Sarah', 'Johnson', 'sarah.j@example.com', '555-987-6543', '987-65-4321', 98756.32, '2022-02-20 14:22:15'),
(1003, 'Michael', 'Brown', 'mbrown@example.com', '555-456-7890', '456-78-9012', 12345.67, '2022-03-05 11:45:32'),
(1004, 'Jennifer', 'Davis', 'jen.davis@example.com', '555-789-0123', '789-01-2345', 67890.12, '2022-04-10 16:18:45');
Step 2: Create users and roles on producer Redshift endpoint for testing
Next, create some test users and roles on the producer endpoint. These users will allow you to test the DDM policies created on the producer.
-- Create user roles
CREATE ROLE analyst_role;
CREATE ROLE service_role;
CREATE ROLE compliance_role;
-- Create users
CREATE USER analyst_user PASSWORD 'AnalystPass123!';
CREATE USER service_user PASSWORD 'ServicePass123!';
CREATE USER compliance_user PASSWORD 'CompliancePass123!';
-- Grant role to users
GRANT ROLE analyst_role TO analyst_user;
GRANT ROLE service_role TO service_user;
GRANT ROLE compliance_role TO compliance_user;
-- Grant privileges
GRANT USAGE ON SCHEMA anybank TO ROLE analyst_role, ROLE service_role, ROLE compliance_role;
GRANT SELECT ON anybank.customers TO ROLE analyst_role, ROLE service_role, ROLE compliance_role;
Step 3: Create DDM policies on producer Redshift endpoint
Next, create the DDM policies on the producer Redshift endpoint to mask sensitive information:
-- Create DDM policies for masking personal data
CREATE MASKING POLICY mask_ssn
WITH (ssn VARCHAR(11))
USING (LEFT(ssn, 7) || '00');
CREATE MASKING POLICY mask_email
WITH (email VARCHAR(100))
USING (LEFT(SPLIT_PART(email, '@', 1), 2) || '---@' || SPLIT_PART(email, '@', 2));
CREATE MASKING POLICY mask_phone
WITH (phone VARCHAR(20))
USING (LEFT(phone, 6) || '0000');
CREATE MASKING POLICY mask_account_balance
WITH (account_balance DECIMAL(12,2))
USING (NULL);
-- Attach the masking policies to the customer table
ATTACH MASKING POLICY mask_ssn
ON anybank.customers (ssn)
TO ROLE analyst_role
;
ATTACH MASKING POLICY mask_ssn
ON anybank.customers (ssn)
TO ROLE service_role
;
ATTACH MASKING POLICY mask_email
ON anybank.customers (email)
TO ROLE analyst_role
;
ATTACH MASKING POLICY mask_phone
ON anybank.customers (phone)
TO ROLE analyst_role
;
ATTACH MASKING POLICY mask_account_balance
ON anybank.customers (account_balance)
TO ROLE analyst_role
;
ATTACH MASKING POLICY mask_account_balance
ON anybank.customers (account_balance)
TO ROLE compliance_role
;
ATTACH MASKING POLICY mask_account_balance
ON anybank.customers (account_balance)
TO ROLE service_role
;
Step 4: Test the DDM policies on the producer with different users
Now you can test the DDM policies by executing queries as different users:
Connect as the compliance user:
SET SESSION AUTHORIZATION compliance_user;
-- Should see masked account_balance, unmasked everything else
SELECT * FROM anybank.customers;
RESET SESSION AUTHORIZATION;
Connect as the analyst user:
SET SESSION AUTHORIZATION analyst_user;
-- Should see masked phone, ssn, email, and account_balance
SELECT * FROM anybank.customers;
RESET SESSION AUTHORIZATION;
Connect as the service user:
SET SESSION AUTHORIZATION service_user;
-- Should see masked ssn and account_balance and unmasked phone and email.
SELECT * FROM anybank.customers;
RESET SESSION AUTHORIZATION;
Step 5: Create datashare on producer Redshift endpoint
Now create a datashare to share the customer data:
-- Create the datashare
CREATE DATASHARE anybank_share;
-- Add usage permissions to the schema
ALTER DATASHARE anybank_share ADD SCHEMA anybank;
-- Turn off masking for this datashare (required to add table to datashare)
ALTER TABLE anybank.customers MASKING OFF FOR DATASHARE;
-- Add the table to the datashare
ALTER DATASHARE anybank_share ADD TABLE anybank.customers;
-- Grant usage on the datashare to the consumer namespace
-- Replace 'CONSUMER-NAMESPACE-GUID' with your actual consumer namespace GUID
GRANT USAGE ON DATASHARE anybank_share TO NAMESPACE 'CONSUMER-NAMESPACE-GUID';
⚠️ Important: Replace
'CONSUMER-NAMESPACE-GUID'
with your actual consumer namespace GUID. You can find this in the Amazon Redshift console or by using the SQL statement on the consumer:select current_namespace
.
💡 To turn off row-level security (RLS) for data shares, run the following statement:
ALTER TABLE anybank.customers ROW LEVEL SECURITY OFF FOR DATASHARE;
Step 6: Create database from datashare on consumer Redshift endpoint
Connect to your consumer Redshift endpoint and create a database from the datashare:
-- Create a database from the datashare
-- Replace 'PRODUCER-NAMESPACE-GUID' with your actual producer namespace GUID
CREATE DATABASE anybank_consumer FROM DATASHARE anybank_share OF NAMESPACE 'PRODUCER-NAMESPACE-GUID';
-- Verify the shared table is accessible
SELECT * FROM anybank_consumer.anybank.customers LIMIT 10;
⚠️ Important: Replace
'PRODUCER-NAMESPACE-GUID'
with your actual producer namespace GUID. You can find this in the Amazon Redshift console or by using the SQL statement on the producer:select current_namespace
.
Step 7: Create late binding view on consumer Redshift endpoint
Create a late binding view on the consumer side that references the shared table:
-- Create a schema for the views
CREATE SCHEMA anybank_views;
-- Create a late binding view
CREATE VIEW anybank_views.customer_data AS
SELECT
customer_id,
first_name,
last_name,
email,
phone,
ssn,
account_balance,
create_date
FROM anybank_consumer.anybank.customers
WITH NO SCHEMA BINDING;
Step 8: Create users and roles on consumer Redshift endpoint for testing
Next, create some test users and roles on the producer endpoint. These users will allow you to test the DDM policies created on the producer.
-- Create user roles
CREATE ROLE analyst_role;
CREATE ROLE service_role;
CREATE ROLE compliance_role;
-- Create users
CREATE USER analyst_user PASSWORD 'AnalystPass123!';
CREATE USER service_user PASSWORD 'ServicePass123!';
CREATE USER compliance_user PASSWORD 'CompliancePass123!';
-- Grant role to users
GRANT ROLE analyst_role TO analyst_user;
GRANT ROLE service_role TO service_user;
GRANT ROLE compliance_role TO compliance_user;
-- Grant privileges
GRANT USAGE ON SCHEMA anybank_views TO ROLE analyst_role, ROLE service_role, ROLE compliance_role;
GRANT SELECT ON anybank_views.customer_data TO ROLE analyst_role, ROLE service_role, ROLE compliance_role;
Step 9: Create DDM policy on consumer Redshift endpoint and attach to view
Now, create a DDM policy on the consumer Redshift endpoint and attach it to the late binding view:
-- Create DDM policies for masking personal data
CREATE MASKING POLICY mask_ssn
WITH (ssn VARCHAR(11))
USING (LEFT(ssn, 7) || '00');
CREATE MASKING POLICY mask_email
WITH (email VARCHAR(100))
USING (LEFT(SPLIT_PART(email, '@', 1), 2) || '---@' || SPLIT_PART(email, '@', 2));
CREATE MASKING POLICY mask_phone
WITH (phone VARCHAR(20))
USING (LEFT(phone, 6) || '0000');
CREATE MASKING POLICY mask_account_balance
WITH (account_balance DECIMAL(12,2))
USING (NULL);
-- Attach the masking policies to the customer table
ATTACH MASKING POLICY mask_ssn
ON anybank_views.customer_data (ssn)
TO ROLE analyst_role
;
ATTACH MASKING POLICY mask_ssn
ON anybank_views.customer_data (ssn)
TO ROLE service_role
;
ATTACH MASKING POLICY mask_email
ON anybank_views.customer_data (email)
TO ROLE analyst_role
;
ATTACH MASKING POLICY mask_phone
ON anybank_views.customer_data (phone)
TO ROLE analyst_role
;
ATTACH MASKING POLICY mask_account_balance
ON anybank_views.customer_data (account_balance)
TO ROLE analyst_role
;
ATTACH MASKING POLICY mask_account_balance
ON anybank_views.customer_data (account_balance)
TO ROLE compliance_role
;
ATTACH MASKING POLICY mask_account_balance
ON anybank_views.customer_data (account_balance)
TO ROLE service_role
;
Step 10: Test the DDM policies with different users
Now you can test the DDM policies by executing queries as different users:
Connect as the compliance user:
SET SESSION AUTHORIZATION compliance_user;
-- Should see masked account_balance, unmasked everything else
SELECT * FROM anybank_views.customer_data;
-- Should get permission denied error
SELECT * FROM anybank_consumer.anybank.customers LIMIT 10;
RESET SESSION AUTHORIZATION;
Connect as the analyst user:
SET SESSION AUTHORIZATION analyst_user;
-- Should see masked phone, ssn, email, and account_balance
SELECT * FROM anybank_views.customer_data;
-- Should get permission denied error
SELECT * FROM anybank_consumer.anybank.customers LIMIT 10;
RESET SESSION AUTHORIZATION;
Connect as the service user:
SET SESSION AUTHORIZATION service_user;
-- Should see masked ssn and account_balance and unmasked phone and email.
SELECT * FROM anybank_views.customer_data;
-- Should get permission denied error
SELECT * FROM anybank_consumer.anybank.customers LIMIT 10;
RESET SESSION AUTHORIZATION;
Conclusion
In this article, you learned how to implement Dynamic Data Masking with Amazon Redshift Data Sharing. By creating DDM policies on the consumer endpoint and applying them to late binding views, you can control data visibility based on user roles without creating multiple copies of the data.
This approach offers several benefits:
- Centralised data management with distributed, controlled access
- No need to create and maintain duplicate datasets
- Role-based access controls that dynamically resolve different results/levels of detail to different users
- Ability to adapt masking policies on the consumer side without changing the producer data
Remember that DDM policies from the producer endpoint don't automatically carry over to consumer endpoints, so you need to independently configure DDM on the consumer side. Using late binding views with DDM policies provides a flexible and secure way to share sensitive data across your organization.
Clean Up
To clean up the resources created in this article:
On the consumer endpoint:
DROP VIEW IF EXISTS anybank_views.customer_data CASCADE;
DROP MASKING POLICY IF EXISTS mask_ssn CASCADE;
DROP MASKING POLICY IF EXISTS mask_email CASCADE;
DROP MASKING POLICY IF EXISTS mask_phone CASCADE;
DROP MASKING POLICY IF EXISTS mask_account_balance CASCADE;
DROP SCHEMA IF EXISTS anybank_views CASCADE;
DROP USER IF EXISTS analyst_user;
DROP USER IF EXISTS service_user;
DROP USER IF EXISTS compliance_user;
DROP ROLE analyst_role;
DROP ROLE service_role;
DROP ROLE compliance_role;
DROP DATABASE anybank_consumer;
On the producer endpoint:
REVOKE USAGE ON DATASHARE anybank_share FROM NAMESPACE 'CONSUMER-NAMESPACE-GUID';
DROP TABLE IF EXISTS anybank.customers CASCADE;
DROP MASKING POLICY IF EXISTS mask_ssn CASCADE;
DROP MASKING POLICY IF EXISTS mask_email CASCADE;
DROP MASKING POLICY IF EXISTS mask_phone CASCADE;
DROP MASKING POLICY IF EXISTS mask_account_balance CASCADE;
DROP SCHEMA IF EXISTS anybank CASCADE;
DROP DATASHARE anybank_share;
DROP USER IF EXISTS analyst_user;
DROP USER IF EXISTS service_user;
DROP USER IF EXISTS compliance_user;
DROP ROLE analyst_role;
DROP ROLE service_role;
DROP ROLE compliance_role;
⚠️ Important: Replace
'CONSUMER-NAMESPACE-GUID'
with your actual consumer namespace GUID.
Authors
Sean Beath - Specialist Redshift Solutions Architect
Rick Fraser - Specialist Data & AI Solutions Architect
Relevant content
- asked a year agolg...
- asked 2 years agolg...
- AWS OFFICIALUpdated 3 years ago