Amazon Redshift Serverless and AWS Data Exchange Healthcare Demo

5 minute read
Content level: Intermediate
2

Faster is better!

Intro

AWS makes it possible to increase the velocity of gaining insights from data with many serverless technologies including Amazon Redshift Serverless and AWS Data Exchange. Below is a step-by-step guide on how to use these technologies to gain insights from third party data in the AWS Data Exchange.

Setup

AWS Data Exchange

The AWS Data Exchange bridges the gap between providers and subscribers exchanging data, helping customers lower costs, become more agile, and innovate faster.

Steps

  1. Navigate to the AWS Data Exchange
  2. Click on "Browse 3,500+ third party data sets"

Browse 3,500+ third party data sets

  1. Search for "Ovation IBD"
  2. You will see two different results from Ovation.

Ovation IBD

  1. Click on the second result, "Ovation IBD Whole Genome (WGS+RNAseq) with Linked Clinical Data [Sample]". Note: The provided SQL works with the second listing which is a sample of 10 patients of data.
  2. Click on the "Continue to subscribe" link

Continue to subscribe

  1. After a period of time, Ovation will complete the process and you will be subscribed to this product listing.

Amazon Redshift Serverless

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is available both as a provisioned cluster or serverless which will be covered here.

Steps

  1. Log into the AWS console
  2. Navigate to Redshift
  3. Deploy Redshift Serverless by using the "Try Redshift Serverless free trial"

Try Redshift Serverless free trial

  1. It is recommended to use the default settings. Simply scroll to the bottom and click "Save configuration".

Save configuration

Once the Ovation product listing has been successfully subscribed to and the configuration of Redshift Serverless completes, you can proceed to the next step.

  1. Go to the Redshift Serverless dashboard and click the three lines in the top left.

Serverless dashboard

  1. Click on datashares

Datashares

  1. Click on the Subscriptions link

Subscriptions

  1. Click on the "ovation_ibd" datashare
  2. Click on the "Create database from datashare" link

Create database from datashare link

  1. Use the "default" namespace, database name of ibd_db_sample, and then click "Create". Note: the database name can be virtually anything you want but the SQL scripts provided are written for the ibd_db_sample name.

Create database from datashare

  1. Click on the "Query editor v2" link

Query editor v2 link

  1. Paste the following queries and then click the "Run" button.

Query editor v2

--PatientData
select patient_id, specimen_pathology_diagnosis_summary, disease_severity,
"% Aligned (STARUniquelyMappedPercent) - % Uniquely mapped reads"/100 as STAR_Uniquely_Mapped_Percent,
"% Dups (PicardPercentDuplication) - Mark Duplicates - Percent Duplication"::numeric(38,6) as Picard_Percent_Duplication ,
"% GC (FastQCTrimmedR1PercentGc) - Average % GC Content"/100::numeric as Fast_QC_Trimmed_R1_PercentGc,
"% GC (FastQCTrimmedR2PercentGc) - Average % GC Content"/100::numeric as Fast_QC_Trimmed_R2_PercentGc,
"M Aligned (QualiMapReadsAligned) - Reads Aligned (millions)"::int as Quali_Map_Reads_Aligned,
"M Aligned (STARUniquelyMapped) - Uniquely mapped reads (millions)"::int as STAR_Uniquely_Mapped,
"Total Reads (STARTotalReads) - Total Reads"::int as STAR_Total_Reads
from (
    select patient_id, specimen_pathology_diagnosis_summary, disease_severity, metric, value
    from    (
            select s.patient_id, s.specimen_pathology_diagnosis_summary, s.disease_severity, 
            q.metric, q.value
            from ibd_db_sample.public.ovation_ibd_sample_specimen s 
            join ibd_db_sample.public.ovation_ibd_sample_sequencing_run r on s.specimen_id = r.specimen_id
            join ibd_db_sample.public.ovation_ibd_sample_sequencing_qc q on r.sequencing_run_id = q.sequencing_run_id
            where s.specimen_pathology_diagnosis_summary <> 'n/a'
            ) as sub
    ) pivot (
        min(value) for metric in ('% Aligned (STARUniquelyMappedPercent) - % Uniquely mapped reads',
                                '% Dups (PicardPercentDuplication) - Mark Duplicates - Percent Duplication',
                                '% GC (FastQCTrimmedR1PercentGc) - Average % GC Content',
                                '% GC (FastQCTrimmedR2PercentGc) - Average % GC Content',
                                'M Aligned (QualiMapReadsAligned) - Reads Aligned (millions)',
                                'M Aligned (STARUniquelyMapped) - Uniquely mapped reads (millions)',
                                'Total Reads (STARTotalReads) - Total Reads')
    );

--PatientDiagnosisCrosstab
select specimen_pathology_diagnosis_summary, coalesce(unspecified, 0) as unspecified, 
coalesce(minimal, 0) as minimal, coalesce(mild, 0) as mild, 
coalesce(moderate, 0) as moderate, coalesce("moderate-to-severe", 0) as moderate_to_severe, coalesce(severe, 0) as severe
from (
    select s.specimen_pathology_diagnosis_summary, s.disease_severity, count(distinct s.patient_id) as total
    from ibd_db_sample.public.ovation_ibd_sample_specimen s 
    join ibd_db_sample.public.ovation_ibd_sample_sequencing_run r on s.specimen_id = r.specimen_id
    join ibd_db_sample.public.ovation_ibd_sample_sequencing_output o on r.sequencing_run_id = o.sequencing_run_id
    group by s.specimen_pathology_diagnosis_summary, s.disease_severity
    ) as sub pivot (sum(sub.total) for disease_severity in 
    ('Unspecified', 'Mild', 'Moderate', 'Severe', 'Moderate-to-Severe', 'Minimal'))
order by specimen_pathology_diagnosis_summary;

--Colitis Patients Drugs
with patients as 
(select s.patient_id, s.specimen_pathology_diagnosis_summary, s.disease_severity
from ibd_db_sample.public.ovation_ibd_sample_specimen s 
join ibd_db_sample.public.ovation_ibd_sample_sequencing_run r on s.specimen_id = r.specimen_id
join ibd_db_sample.public.ovation_ibd_sample_sequencing_output o on r.sequencing_run_id = o.sequencing_run_id
where s.specimen_pathology_diagnosis_summary <> 'n/a'
)
select p.patient_id, r.generic_name, p.specimen_pathology_diagnosis_summary, p.disease_severity, count(*) as total_claims
from ibd_db_sample.public.ovation_ibd_sample_rx_products r
join ibd_db_sample.public.ovation_ibd_sample_rx_claims c on r.ndc_product = c.ndc_product
join patients p on c.patient_id = p.patient_id
where generic_name <> 'Not Specified'
group by p.patient_id, r.generic_name, p.specimen_pathology_diagnosis_summary, p.disease_severity
order by 5 desc;

Summary

Amazon Redshift Serverless with AWS Data Exchange makes it very quick and easy to increase the velocity of gaining insights. Third party data like the above example from Ovation, is quick and easy to access with Redshift Serverless. You can combine third party data with your own to enhance it. You can also monetize your data by selling it on the AWS Data Exchange.

Be sure to watch this YouTube video that demonstrates these steps and also an Amazon QuickSight dashboard. QuickSight enables BI for everyone in the organization and is also built on a serverless architecture that enables consumption based pricing and auto-scale.

profile pictureAWS
EXPERT
published 6 months ago1314 views