Skip to content

Possible AWS Glue DQDL parser bug: composite rule with `where` fails, but equivalent simple rules work

-1

Hello,

I found what looks like a parser inconsistency in AWS Glue Data Quality DQDL and wanted to check whether others have seen the same behavior.

Context:

  • Service: AWS Glue Data Quality
  • API: CreateDataQualityRuleset
  • Region: us-east-1
  • Observed on: March 18, 2026

Problem: A composite rule using and works without a where clause. A simple rule also works with where. But when I apply a single where clause to the full composite expression, the ruleset is rejected with:

InvalidInputException: DataQuality rules cannot be parsed

Minimal example

This works:

Rules = [
  ColumnLength "field_a" >= 1,
  ColumnLength "field_a" <= 5
]

This also works:

Rules = [
  ColumnLength "field_a" >= 1 where "status_flag = DATE '2099-12-31'",
  ColumnLength "field_a" <= 5 where "status_flag = DATE '2099-12-31'"
]

This fails:

Rules = [
  (ColumnLength "field_a" >= 1) and (ColumnLength "field_a" <= 5) where "status_flag = DATE '2099-12-31'"
]

Error:

InvalidInputException: DataQuality rules cannot be parsed

What makes this confusing:

  • Composite rules seem supported
  • where clauses seem supported
  • But where attached to the composite expression appears to fail parsing

Workaround: Instead of writing one composite rule with a single where, I had to split it into two separate rules and repeat the same where clause on each one.

Question: Is this expected behavior and just undocumented, or is this a parser bug/limitation in Glue Data Quality?

If expected, it would be very helpful to have this explicitly documented in the DQDL reference, because the current behavior is not obvious.

asked 2 months ago32 views
2 Answers
1

Based on the current DQDL specification, this appears to be a limitation in how composite rules interact with where clauses rather than a documented bug. Here's what's happening:

Current DQDL Support:

  • Composite rules using and and or operators are supported for combining multiple rules
  • where clauses are supported for filtering data before applying data quality rules
  • However, the syntax for applying a where clause to an entire composite expression is not explicitly documented as supported

Why Your Workaround Works: When you split the composite rule into two separate rules with individual where clauses, each rule is parsed independently with its own filter condition. This follows the documented pattern where where clauses are applied at the individual rule level.

The Parsing Issue: The parser appears to expect where clauses to be associated with individual rule expressions rather than composite expressions. When you write:

(ColumnLength "field_a" >= 1) and (ColumnLength "field_a" <= 5) where "status_flag = DATE '2099-12-31'"

The parser may be interpreting the where clause as applying only to the last part of the composite expression, or it may not recognize this as valid syntax at all, resulting in the parsing error.

Recommendation: Your workaround of using separate rules with repeated where clauses is currently the correct approach. While this requires duplicating the filter condition, it aligns with how DQDL currently handles the combination of composite rules and filtering.

This does appear to be a gap in the DQDL syntax that could benefit from either explicit documentation or enhancement to support where clauses on composite expressions directly.
Sources
Troubleshooting AWS Glue Data Quality errors - AWS Glue
Data Quality Definition Language (DQDL) reference - AWS Glue
AWS Glue Data Quality - AWS Glue

answered 2 months ago
EXPERT
reviewed 2 months ago
  • The parser error could be improved to explicitly say composite expressions do not support where. Also, the DQDL docs could include an example of the supported workaround.

0

It's not a bug, it's a confirmed limitation on the DQDL grammar. You work around it like you did, or (at least in your particular example) use a different expression like between that would let you use a single where clause.

The documentation could be clearer, though. I recommend that you use the Feedback link on the DQDL reference page.

AWS
answered 2 months ago
  • Thanks, that helps clarify that this is a grammar limitation rather than a parser bug.

    One important point, though: using between is not equivalent in my case. I tested that option earlier, but in the current DQDL reference, between x and y is documented as exclusive. Because of that, it does not preserve the same semantics as:

    ColumnLength "field_a" >= 1 and ColumnLength "field_a" <= 5

    For validations where the upper bound must be included, between changes the rule behavior and can introduce maintenance risk later, because the DQDL becomes shorter but no longer expresses the original constraint correctly.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.