Reading dynamo DB based glue catalog table using redshift external table

0

I have created a glue catalog table on top of a dynamo DB table using glue crawler.

I'm trying to read this glue table by creating a external schema in redshift on top of glue database.

Glue catalog and redshift are located in different accounts, so I have configured required IAM permissions (DynamoDB and glue is located on the same account).

Following is the CREATE EXTERNAL SCHEMA used to create the external schema :

create external schema a2dynamodb from data catalog database 'a2-dynamodb' region 'us-east-1' iam_role 'arn:aws:iam::<RedshiftAccountID>:role/CrossAccountGlueToRedshift,arn:aws:iam::<GlueAccountID>:role/GlueDynamoAnalyticsAccountRole';

Once the schema is created based on Glue database, redshift is automatically fetching all the tables in the glue database as external tables under the a2dynamodb external schema.

When I try to query these tables, I get the follwoing error on redshift:

[XX000] ERROR: S3Location [2023-10-27 15:18:41] Detail: [2023-10-27 15:18:41] ----------------------------------------------- [2023-10-27 15:18:41] error: S3Location [2023-10-27 15:18:41] code: 8001 [2023-10-27 15:18:41] context: Invalid S3 Path: arn:aws:dynamodb:us-east-1:<DynamoDBAccountID>:table/Users [2023-10-27 15:18:41] query: 32988533 [2023-10-27 15:18:41] location: s3location.cpp:146 [2023-10-27 15:18:41] process: padbmaster [pid=1073840402]

Folowing is the CREATE TABLE query of the auto fetched external table :

create external table a2dynamodb.users ( devadmin boolean, costcenterid string, systemuser boolean, buildingcode string, costcentername string, fullname string, userid string, email string, status string, username string, managerusername string ) row format serde 'com.amazon.ionhiveserde.IonHiveSerDe' stored as inputformat 'com.amazon.ionhiveserde.formats.IonInputFormat' outputformat 'com.amazon.ionhiveserde.formats.IonOutputFormat' location 'arn:aws:dynamodb:us-east-1:<DynamoDBAccountID>:table/Users' table properties ('CrawlerSchemaDeserializerVersion'='1.0', 'CrawlerSchemaSerializerVersion'='1.0', 'UPDATED_BY_CRAWLER'='A2-DynamoDB-Users');

Why am I getting Invalid S3 path error when the external tables are based on dynamoDB tables?

asked 6 months ago301 views
2 Answers
0
Accepted Answer

Use DynamoDB as source for Glue table and direct integration with Amazon Redshift is not yet supported. Alternatively you can get the data synced into Redshift and then query it.

The Redshift COPY command supports loading DynamoDB table into Redshift table. This is good for one-time load. Refer https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-source-dynamodb.html

For continued replication refer https://aws.amazon.com/blogs/database/dynamodb-streams-use-cases-and-design-patterns/ which describes leveraging DynamoDB Streams.

profile pictureAWS
answered 6 months ago
0

Regarding the continuous replication, an article was recently posted which shows how to use a stored procedure to convert the JSON payload to tables/columns and to even handle schema changes:

https://repost.aws/articles/ARlyMMJAeUQLWZd8p3c1Jz6A/automated-change-data-capture-cdc-data-ingestion-from-dynamodb-to-redshift

profile pictureAWS
EXPERT
Rajiv_G
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions