Ir para o conteúdo

AWS Glue crawler won't sync tables column's description

0

I have AWS Glue Crawlers that crawls databases for table's metadata in Snowflake via JDBC connection. It will crawl table's fields and types to AWS Glue but it will not bring the fields descriptions from there no matter what. I have tried enabling expanded metadata when creating the crawler but even with that it will not sync fields descriptions. Is this a tool limitation? because I need these descriptions to be update since I'm syncronizing AWS Glue and AWS DataZone, the latter being more used by business users. This is why I need these descriptions so much.

table crawled to AWS Glue

feita há um ano287 visualizações
2 Respostas
0

The issue you're experiencing with AWS Glue crawlers not syncing column descriptions from Snowflake via JDBC connection is a known limitation of the tool. While AWS Glue crawlers are effective at capturing table metadata such as column names and data types, they do not automatically retrieve column descriptions or comments from the source database.

To enable additional metadata retrieval, you've correctly tried enabling expanded metadata when creating the crawler. However, for JDBC targets, the expanded metadata option is limited to retrieving only specific types of additional information. For JDBC connections, you can enable additional metadata by specifying values like 'RAWTYPES' or 'COMMENTS' in the crawler configuration. These options provide native-level datatype information and comments associated with columns or tables in the database, respectively.

Unfortunately, even with these options enabled, column descriptions from Snowflake may not be captured as you expect. This is because the way Snowflake stores and exposes column descriptions may not align with how AWS Glue crawlers retrieve this information through JDBC connections.

Given this limitation, you may need to consider alternative approaches to ensure that column descriptions are available in AWS Glue and subsequently in AWS DataZone for your business users. Some potential workarounds include:

  1. Manually updating the column descriptions in the AWS Glue Data Catalog after the crawler has run.
  2. Developing a custom solution to extract column descriptions from Snowflake and update the AWS Glue Data Catalog programmatically.
  3. Exploring if there are any Snowflake-specific JDBC drivers or connection options that might expose column descriptions in a way that AWS Glue can interpret.

While these workarounds require additional effort, they may be necessary to achieve your goal of having up-to-date column descriptions available for your business users in AWS DataZone.
Sources
Crawler API - AWS Glue
Crawler - AWS Glue
Overview of using AWS Glue - AWS Glue

respondido há um ano
0

In order for the comments added to the Table/View to come through to the Glue Catalog table, we would have to choose COMMENTS and RAWTYPE for additional metadata as mentioned in the document[1(Step 10)]. This allows the crawler to harvest metadata related to comments and raw types like geospatial columns. Could you please confirm if this option was enabled and if the comments does not come through Glue Catalog even after enabling this.

[1] https://aws.amazon.com/blogs/big-data/announcing-aws-glue-crawler-support-for-snowflake/

AWS
ENGENHEIRO DE SUPORTE
respondido há um ano

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.