By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Glue Data quality - Recommendation run failed with Reference 'colunm_name' is ambiguous

0

I'm trying to reproduce the exercise in the link below, from the AWS Glue Immersion day workshop, but I'm having a problem with a rule recommendation failure in Glue Data Quality, due to an ambiguous column name reference, but this error doesn't match, because the table doesn't have columns with the same names. I would like to know how I can resolve this situation to continue executing the exercises. Does anyone have any suggestions?

https://catalog.us-east-1.prod.workshops.aws/workshops/ee59d21b-4cb8-4b3d-a629-24537cf37bb5/en-US/lab12/dq-at-rest

ERROR 2024-11-28 14:07:52,162 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(94)): Exception in User Class amzn.reshaded.com.amazon.deequ.analyzers.runners.MetricCalculationRuntimeException: org.apache.spark.sql.AnalysisException: Reference 'datechecked' is ambiguous, could be: datechecked, datechecked. at amzn.reshaded.com.amazon.deequ.analyzers.runners.MetricCalculationException$.wrapIfNecessary(MetricCalculationException.scala:74) at amzn.reshaded.com.amazon.deequ.analyzers.Analyzers$.metricFromFailure(Analyzer.scala:593) at amzn.reshaded.com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.toFailureMetric(Analyzer.scala:235) at amzn.reshaded.com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.toFailureMetric(Analyzer.scala:215) at amzn.reshaded.com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$6(AnalysisRunner.scala:335) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) at scala.collection.immutable.List.foreach(List.scala:388

asked 10 days ago24 views
1 Answer
0

The error you're encountering suggests that there might be an issue with column name ambiguity in your Glue Data Quality recommendation run. This can occur even if your table doesn't have columns with the same names. Here are a few suggestions to resolve this issue:

  1. Check for column name case sensitivity: Ensure that the column names in your table are consistent in terms of capitalization. Sometimes, Spark SQL (which Glue Data Quality uses) can be case-sensitive, leading to ambiguity if the same column name appears with different capitalization.

  2. Verify table schema: Double-check the schema of your table in the Glue Data Catalog. Make sure there are no duplicate column names or unexpected variations.

  3. Refresh table metadata: Try refreshing the table metadata in the Glue Data Catalog. This can sometimes resolve issues with column references.

  4. Use fully qualified column names: If possible, try using fully qualified column names (e.g., table_name.column_name) in your DQDL rules or SQL statements.

  5. Check for any views or temporary tables: Ensure that there are no views or temporary tables with similar column names that might be causing the ambiguity.

  6. Review recent changes: If you've recently made changes to the table structure or data, consider reverting those changes to see if the issue persists.

  7. Examine the data source: If your table is sourced from external data, verify that the source data doesn't have any column name inconsistencies.

  8. Update Glue version: Ensure you're using the latest version of AWS Glue, as newer versions might have fixes for such issues.

If these suggestions don't resolve the issue, you may need to contact AWS support for further assistance, as there could be an underlying problem with the Glue Data Quality service itself.
Sources
Troubleshooting AWS Glue Data Quality errors - AWS Glue
Getting started with AWS Glue Data Quality for the Data Catalog - AWS Glue

profile picture
answered 10 days ago
profile picture
EXPERT
reviewed 9 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions