Glue Data Catalog configuration when updating with Database Migration Service

0

I set up a replication task with AWS Database Migration Service to implement full load + CDC from a RDS instance to a S3 bucket. Since I want to use Athena to query the data in S3, I set the option "GlueCatalogGeneration": true so that I wouldn't need to configure a separate crawler to run periodically and get me the latest data: however, I realized that when DMS generates the tables in the Glue Data Catalog it sets the option escape.delim to null. This doesn't seem to be a problem for Athena, but if I try to access any table using Spark (e.g. with the create_dynamic_frame_from_catalog option) I receive an error of IllegalEscaper; is there some option in DMS I can configure so that this parameter doesn't get created at all?

2 réponses
1
Réponse acceptée

When DMS replicates data from a database to S3 and enables Glue catalog generation, it sets certain properties in the generated Glue tables. One such property is escape.delim, which gets set to null.

This null value does not cause issues when querying the data from Athena. However, it can cause problems when trying to access the tables from Spark using the create_dynamic_frame_from_catalog option, as Spark expects a non-null escape delimiter value.

There is currently no option in DMS to configure this escape.delim property value.

  • After the initial load and replication is complete, update the Glue table definition manually through the Glue console or API to set a non-null escape delimiter value.
  • Alternatively, instead of using create_dynamic_frame_from_catalog in Spark, you can directly query the data from S3 using Spark SQL without going through the Glue catalog.
profile picture
EXPERT
répondu il y a un mois
profile pictureAWS
EXPERT
vérifié il y a un mois
0

I noticed that when using create_dynamic_frame_from_options and reading directly from S3 I don't have the same problem, I was curious as to why that was the case. Thank you! Now it's clear

répondu il y a un mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions