- Newest
- Most votes
- Most comments
Potential Regression: Athena v3 Vectorized Parquet Reader vs. Security Lake Compactor
Based on your diagnostic, this appears to be a regression in Athena Engine v3's Vectorized Parquet Reader specifically handling the Parquet encoding produced by the Security Lake Avro-to-Parquet compactor. Since pyarrow reads the files correctly, the underlying data is valid, but the Trino-based engine is likely stumbling on page-level metadata or specific encodings (like RLE or Dictionary encoding) used in the managed compaction.
Key points to consider:
-
Validation of the Bug: The fact that manifest-only queries succeed confirms the Iceberg metadata layer is intact. The failure during body reads is a classic symptom of a vectorized reader mismatch.
-
The "Managed Table" Trap: Because Security Lake restricts DDL, the standard fix (disabling vectorization via TBLPROPERTIES) is unavailable.
-
Immediate Workaround: Try disabling vectorization at the session level instead of the table level. Run this before your query in the Athena console:
SET SESSION parquet_vectorized_reader_enabled = false;If this allows the query to complete, it confirms a bug in the Athena v3 engine's optimized reader.
-
Service-Linked Issue: Since you cannot OPTIMIZE or VACUUM, this requires an AWS internal ticket. The Security Lake team likely deployed a new compactor version that produces Parquet features (or Avro-specific metadata keys) that the current Athena v3 build in us-east-1 is misinterpreting.
my recommendation: Open an AWS Support case under Amazon Security Lake (not just Athena). Explicitly state that the issue is with service-managed compaction and provide the QueryExecutionId where the session-level vectorization toggle was tested. This forces the internal teams to look at the compatibility between the compactor and the engine.
Thanks — really useful triage. I tested the session-level toggle directly:
SET SESSION parquet_vectorized_reader_enabled = false SET SESSION parquet_optimized_reader_enabled = false SET SESSION parquet_use_column_index = false
All three return InvalidRequestException: Queries of this type are not supported — both via the StartQueryExecution API and in the Athena console. Athena engine v3 does not expose session-level parquet-reader tunables to customers, which is consistent with the very limited SET SESSION surface documented in the engine v3 reference. Which means the lockdown is total, and actually strengthens the support-case angle: there is no customer-side path to mitigate, so resolution has to come from AWS. Agree on routing — when I escalate to a paid case I'll file under Amazon Security Lake primary, Athena secondary, Lake Formation tertiary, so the right teams have to coordinate at the boundary between the managed Avro→Parquet compactor and the engine v3 Iceberg-aware reader.
I will see what they say. I was hoping to garner a bit of attention on here so I wouldn't have to use the paid services.
Based on the available documentation, there is a known issue with Athena SQL reading Iceberg tables created by certain processes. Specifically, Athena SQL cannot read Iceberg table type V3 created by Amazon EMR Spark, resulting in a "GENERIC_INTERNAL_ERROR: Cannot read unsupported version 3" error. This suggests there are compatibility challenges between different Iceberg implementations and Athena's engine.
Additionally, there's a documented issue where Iceberg tables created outside Athena using Apache Iceberg SDK versions earlier than 0.13.0 can cause "Access Denied: Cannot select from columns" errors in Athena engine version 3. This occurs because earlier SDK versions don't properly populate column information in AWS Glue, preventing Lake Formation from determining authorized columns.
However, the specific regression you're describing—ICEBERG_CURSOR_ERROR when reading Security Lake's Avro-to-Parquet compactor output with engine v3—is not explicitly documented in the available sources. The pattern you're observing (manifest-only queries succeeding while body-read queries fail, files validating correctly with pyarrow, and the issue spreading across multiple sources and partitions) suggests a compatibility issue between Athena engine v3's vectorized Parquet reader and the specific Parquet format produced by Security Lake's compactor.
For Iceberg tables created outside Athena with older SDK versions, the documented solution is to perform an update using ALTER TABLE SET TBLPROPERTIES or use the latest Iceberg SDK to fix the table and update column information in AWS Glue. However, this approach wouldn't be available for Lake Formation-managed Security Lake tables where you lack DDL permissions.
Given the specificity of your issue and the lack of documented workarounds for Security Lake-managed tables, I recommend opening an AWS Support case with your QueryExecutionId traces and sample file details to determine if this is a known regression that requires a service-side fix.
Sources
Amazon EMR release 7.12.0 - Amazon EMR
Athena engine version 3 - Amazon Athena
Thanks for digging through this. Two of the linked issues don't apply to our case — flagging here for anyone else who lands on the same pattern:
- Iceberg format-version 3 / EMR-Spark angle. Our tables are format-version: 2 per the downloaded metadata.json (the standard Athena-supported variant). The "Cannot read unsupported version 3" error is from EMR-Spark-written V3 tables. We don't use Amazon EMR or Apache Spark anywhere in our stack — the only writer to these tables is Security Lake's own internal Avro→Parquet compactor.
- Old Iceberg SDK / Lake Formation column-access regression. Different error class. The documented Access Denied: Cannot select from columns issue (Iceberg SDK <0.13.0 not populating Glue column metadata) doesn't apply: our error is ICEBERG_CURSOR_ERROR: Failed to read Parquet file, the SDK is current (Security Lake writes these tables, not us), and Glue column metadata is intact and queryable.
Agree the path forward is a support case for service-side investigation. The pattern — manifest-only queries succeed, body-reads fail, files validate cleanly via pyarrow, the broken set spreading across all 7 OCSF v2.0 sources and multiple producer accounts daily — doesn't match any of the documented Iceberg/Athena issues I've found. Reads like a recent regression in engine v3's vectorized parquet reader against the current Security Lake compactor output format specifically.
Relevant content
- asked 3 years ago
- asked 10 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago

If my answer was helpful, I would appreciate it if you could mark it as the accepted answer.