Reading dynamo DB based glue catalog table using redshift external table

0

I have created a glue catalog table on top of a dynamo DB table using glue crawler.

I'm trying to read this glue table by creating a external schema in redshift on top of glue database.

Glue catalog and redshift are located in different accounts, so I have configured required IAM permissions (DynamoDB and glue is located on the same account).

Following is the CREATE EXTERNAL SCHEMA used to create the external schema :

create external schema a2dynamodb from data catalog database 'a2-dynamodb' region 'us-east-1' iam_role 'arn:aws:iam::<RedshiftAccountID>:role/CrossAccountGlueToRedshift,arn:aws:iam::<GlueAccountID>:role/GlueDynamoAnalyticsAccountRole';

Once the schema is created based on Glue database, redshift is automatically fetching all the tables in the glue database as external tables under the a2dynamodb external schema.

When I try to query these tables, I get the follwoing error on redshift:

[XX000] ERROR: S3Location [2023-10-27 15:18:41] Detail: [2023-10-27 15:18:41] ----------------------------------------------- [2023-10-27 15:18:41] error: S3Location [2023-10-27 15:18:41] code: 8001 [2023-10-27 15:18:41] context: Invalid S3 Path: arn:aws:dynamodb:us-east-1:<DynamoDBAccountID>:table/Users [2023-10-27 15:18:41] query: 32988533 [2023-10-27 15:18:41] location: s3location.cpp:146 [2023-10-27 15:18:41] process: padbmaster [pid=1073840402]

Folowing is the CREATE TABLE query of the auto fetched external table :

create external table a2dynamodb.users ( devadmin boolean, costcenterid string, systemuser boolean, buildingcode string, costcentername string, fullname string, userid string, email string, status string, username string, managerusername string ) row format serde 'com.amazon.ionhiveserde.IonHiveSerDe' stored as inputformat 'com.amazon.ionhiveserde.formats.IonInputFormat' outputformat 'com.amazon.ionhiveserde.formats.IonOutputFormat' location 'arn:aws:dynamodb:us-east-1:<DynamoDBAccountID>:table/Users' table properties ('CrawlerSchemaDeserializerVersion'='1.0', 'CrawlerSchemaSerializerVersion'='1.0', 'UPDATED_BY_CRAWLER'='A2-DynamoDB-Users');

Why am I getting Invalid S3 path error when the external tables are based on dynamoDB tables?

posta un anno fa582 visualizzazioni
2 Risposte
0
Risposta accettata

Use DynamoDB as source for Glue table and direct integration with Amazon Redshift is not yet supported. Alternatively you can get the data synced into Redshift and then query it.

The Redshift COPY command supports loading DynamoDB table into Redshift table. This is good for one-time load. Refer https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-source-dynamodb.html

For continued replication refer https://aws.amazon.com/blogs/database/dynamodb-streams-use-cases-and-design-patterns/ which describes leveraging DynamoDB Streams.

profile pictureAWS
ESPERTO
con risposta un anno fa
0

Regarding the continuous replication, an article was recently posted which shows how to use a stored procedure to convert the JSON payload to tables/columns and to even handle schema changes:

https://repost.aws/articles/ARlyMMJAeUQLWZd8p3c1Jz6A/automated-change-data-capture-cdc-data-ingestion-from-dynamodb-to-redshift

profile pictureAWS
ESPERTO
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande