Skip to content

Glue Data Quality evaluation run fails with LakeFormation AccessDeniedException on Iceberg table (HybridAccessEnabled)

0

PROBLEM

AWS Glue Data Quality standalone evaluation runs (Data Catalog-based, not embedded in ETL) fail with AccessDeniedException from Lake Formation when accessing an Iceberg table after removing IAM_ALLOWED_PRINCIPALS to enforce Lake Formation governance.

The same table works correctly with:

  • Glue ETL jobs
  • Glue Crawlers
  • Glue Table Compaction
  • Athena queries (via Lake Formation credential vending)

Only Glue DQ standalone evaluation runs fail.

ENVIRONMENT

  • Region: eu-west-1
  • Table format: Apache Iceberg
  • S3 location: Registered in Lake Formation with HybridAccessEnabled: true
  • Lake Formation Data Lake Settings: CreateTableDefaultPermissions: [] (no auto-grant of IAM_ALLOWED_PRINCIPALS for new tables)

ERROR

Exception in User Class: software.amazon.awssdk.services.lakeformation.model.AccessDeniedException : Service returned error code AccessDeniedException (Service: LakeFormation, Status Code: 400)

CLOUDWATCH LOGS (log group: /aws-glue/data-quality/error)

The logs reveal the exact internal failure path:

  1. GlueContext.getCatalogSource: isRegisteredWithLakeFormation: true

  2. org.apache.iceberg.aws.fgac.lakeformation.LakeFormationUtils: Allow table arn:aws:glue:<region>:<account>:table/<database>/<table> to be accessed

  3. TableStorageCredentialsCache: Using credential vendor class "com.amazonaws.glue.accesscontrol.AWSLFCVendor"

  4. "AWSLFCVendor": Getting credentials for table: TableId(arn:aws:glue:<region>:<account>:table/<database>/<table>)

  5. FAIL: AccessDeniedException (LakeFormation, Status Code: 400) Retries with exponential backoff (7 attempts), all fail identically

DQ EXECUTION ROLE CONFIGURATION (all verified)

Trust policy: - glue.amazonaws.com - scheduler.amazonaws.com - lakeformation.amazonaws.com (added during troubleshooting, did NOT resolve)

IAM permissions: - lakeformation:GetDataAccess on * - lakeformation:GetTemporaryGlueTableCredentials on * - s3:Get*/List*/Put* on the data bucket - glue:* on all tables, databases, and catalog - No permissions boundary on this role

Lake Formation grants on the Iceberg table: - Table level: ALTER, DELETE, DESCRIBE, DROP, INSERT - TableWithColumns (ColumnWildcard): SELECT -- this is "All Table Access"

Lake Formation grants on the database: - ALL, CREATE_TABLE, DESCRIBE

Lake Formation grants on the default database: - ALL, DESCRIBE

WHAT WE RULED OUT

  1. Missing IAM permissions -- Role has GetDataAccess, GetTemporaryGlueTableCredentials, S3, Glue
  2. Missing Lake Formation grants -- SELECT (ColumnWildcard), DESCRIBE, ALTER, DELETE, DROP, INSERT all present
  3. Missing trust policy -- Added lakeformation.amazonaws.com; did not resolve
  4. Permissions boundary -- None on DQ role
  5. Default database access -- DQ role has ALL, DESCRIBE on default DB
  6. Table-level vs column-level SELECT -- Granting SELECT at table level is auto-converted by LF to TableWithColumns with ColumnWildcard (same grant)

KEY OBSERVATION: DIFFERENT GLUE RUNTIMES USE DIFFERENT LF ACCESS PATHS

Glue RuntimeInternal Access PathLakeFormation APIWorks without IAM_ALLOWED_PRINCIPALS?
ETL Job (GlueContext)Native GlueGetDataAccess (visible in CloudTrail)Yes
CrawlerNative GlueGetDataAccess (visible in CloudTrail)Yes
Table CompactionNative GlueGetDataAccess (visible in CloudTrail)Yes
Data Quality (Iceberg)"AWSLFCVendor"GetTemporaryGlueTableCredentials (NOT logged in CloudTrail)NO

CLOUDTRAIL GAP

GetTemporaryGlueTableCredentials produces zero events in the CloudTrail management trail for the entire investigation period (checked for all roles, not just DQ). This API appears to be a data-plane event, making customer-side debugging impossible without CloudWatch logs.

WORKAROUND

Restoring IAM_ALLOWED_PRINCIPALS on the table makes DQ succeed immediately (the role falls back to direct IAM/S3 access). But this breaks our Lake Formation governance model and also breaks DataZone Athena queries (due to the DataZone environment role's permissions boundary restricting S3 paths to /datazone/ prefix only).

QUESTIONS

  1. Does the Glue Data Quality standalone evaluation run (Data Catalog-based) fully support GetTemporaryGlueTableCredentials for Iceberg tables when the S3 location has HybridAccessEnabled: true?

  2. The DQ documentation states Iceberg + LF "All Table Access" is supported (since Jul 2025). Is there a known limitation with HybridAccessEnabled specifically?

  3. The internal credential vendor ("AWSLFCVendor") calls GetTemporaryGlueTableCredentials -- what SupportedPermissionTypes does it pass? Could a mismatch with the TableWithColumns/ColumnWildcard grant cause the AccessDeniedException?

  4. Is there any additional configuration required beyond what is documented for Glue DQ + LakeFormation + Iceberg (e.g., CreateLakeFormationOptIn for the DQ role in hybrid mode)?

RELATED

Does Data Quality work with Apache Iceberg tables? https://repost.aws/questions/QUvLBjMvBmQRqpqIFyM4CVgA This older re:Post question was answered confirming the limitation. Our scenario is post-Jul 2025 release where support was supposedly added.

Thank you.

asked a month ago40 views
1 Answer
0
Accepted Answer

For your case, it seems to exhibit a documented limitation when executed against Apache Iceberg tables configured with HybridAccessEnabled=true, specifically when relying on Lake Formation credential vending through the GetTemporaryGlueTableCredentials API.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html

https://docs.aws.amazon.com/glue/latest/dg/security-lf-enable.html

EXPERT
answered a month ago
AWS
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.