Skip to content

Unable to view Iceberg table data in Athena despite successful Glue crawl

0

I'm encountering an issue while querying an Iceberg table in Athena. The table schema is successfully detected in the Glue Catalog, and the Glue crawler runs without errors. However, when attempting to scan or query the table, Athena throws a GENERIC_INTERNAL_ERROR: No scheme for file system location.

I have verified the following: -The Iceberg tables are correctly crawled by Glue. -The metadata is available in the S3 location. -The necessary IAM roles for full access to S3 are in place.

Tables in database glue catalog

I've refreshed the Glue Catalog and attempted multiple queries, but the issue persists.

I've also confirmed that the Iceberg format is correctly recognized in Glue, but Athena still fails to read the data. Steps taken: -Refreshed Glue Catalog multiple times. -Verified S3 path and metadata files for correctness. -Checked IAM roles and permissions. -Attempted simpler queries like SELECT * FROM "testing"."customer_sheet_73" LIMIT 10; but the error persists. result tables crawler in glue catalog

The error message I receive is: GENERIC_INTERNAL_ERROR: No scheme for file system location: iceberg_customer_tables/customer_sheet_73/metadata/snap-6592602586495144306-1-4f7984eb-f448-42cb-bb55-e61656b4a175.avro."

asked a year ago782 views
2 Answers
0
Accepted Answer

This error typically occurs when there's a misconfiguration in how Athena is trying to access the Iceberg table location. Let's solve this step by step:

  1. First, ensure the S3 location is properly formatted in the table properties:
ALTER TABLE database_name.table_name 
SET LOCATION 's3://bucket-name/path/to/iceberg/table'
  1. Check if the table properties include the necessary Iceberg configurations:
ALTER TABLE database_name.table_name 
SET TBLPROPERTIES (
    'table_type'='ICEBERG',
    'format'='parquet'
)
  1. Verify the storage descriptor in the Glue table definition:
import boto3

glue_client = boto3.client('glue')
response = glue_client.get_table(
    DatabaseName='your_database',
    Name='your_table'
)
print(response['Table']['StorageDescriptor'])
  1. Add explicit S3 scheme to table location. You can do this using AWS CLI:
aws glue update-table --database-name your_database --table-input '{
    "Name": "your_table",
    "StorageDescriptor": {
        "Location": "s3://your-bucket/path/to/table"
    }
}'
  1. If the issue persists, try recreating the table using CREATE TABLE command:
CREATE TABLE database_name.new_table_name (
    -- your column definitions
)
LOCATION 's3://bucket-name/path/to/iceberg/table'
TBLPROPERTIES (
    'table_type'='ICEBERG',
    'format'='parquet'
);
  1. Check Athena workgroup settings:
  • Ensure the workgroup has access to the S3 location
  • Verify the query results location is properly set
  • Check if encryption settings are properly configured
  1. Add explicit permissions in IAM role:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::your-bucket/*",
                "arn:aws:s3:::your-bucket"
            ]
        }
    ]
}
  1. If you're using a custom catalog, ensure it's properly configured:
CREATE CATALOG iceberg_catalog
WITH (
    'catalog.name' = 'hive',
    'hive.metastore' = 'glue'
);

Additional Troubleshooting:

  1. Verify file permissions:
aws s3 ls s3://your-bucket/path/to/iceberg/table/ --recursive
  1. Check for any Athena service-linked role issues:
aws iam get-role --role-name AWSServiceRoleForAthena

If none of these solutions work, you might want to:

  1. Check if your Iceberg table version is compatible with your Athena engine version
  2. Verify that all metadata files are present and accessible
  3. Consider recreating the table from scratch
AWS
EXPERT
answered a year ago
-1

The issue you're experiencing with Athena being unable to read your Iceberg table data despite a successful Glue crawl could be related to how the Iceberg catalog is set up in Athena. Here are some steps to troubleshoot and potentially resolve the issue:

  1. Ensure you have created an Iceberg catalog in Athena. This is a crucial step that connects Athena to your Iceberg tables. You can create an Iceberg catalog using a SQL statement similar to this:

    CREATE CATALOG iceberg_catalog WITH ( catalog_type = 'iceberg', warehouse_dir = 's3://your-bucket-name/', catalog_impl = 'org.apache.iceberg.aws.glue.GlueCatalog' );

    Make sure to replace 'your-bucket-name' with the actual S3 bucket where your Iceberg data is stored.

  2. After creating the catalog, switch to the specific database where your Iceberg tables are located.

  3. Verify that you're using Athena engine version 3 or later, as earlier versions may not fully support Iceberg tables.

  4. Check if the table properties in Glue include the necessary Iceberg-specific properties. You can use the DESCRIBE FORMATTED command in Athena to view the table properties.

  5. Ensure that the S3 location in your table definition matches the actual location of your Iceberg data files.

  6. If you're still encountering issues, you might need to recreate the table in Athena using a CREATE TABLE statement that explicitly defines it as an Iceberg table.

  7. Double-check that your IAM roles have the necessary permissions not just for S3, but also for Glue and Athena operations related to Iceberg tables.

If these steps don't resolve the issue, you may need to investigate further by looking at the Iceberg metadata files in S3 to ensure they're correctly formatted and accessible. Additionally, consider checking if there are any version compatibility issues between your Iceberg data, Glue, and Athena.

Remember that Iceberg support in Athena is relatively new, so ensure you're using the latest available versions and check for any known issues or limitations in the AWS documentation.
Sources
Querying Apache Iceberg Tables in Amazon Athena | AWS re:Post
Troubleshoot Athena Apache Iceberg table errors | AWS re:Post

answered a year ago
  • Setting Up Your Iceberg Environment

    1. Creating an Iceberg Catalog First, you'll need to create an Iceberg catalog in Athena. This catalog serves as the connection point between Athena and your Iceberg tables.

    CREATE CATALOG iceberg_catalog WITH ( catalog_type = 'iceberg', warehouse_dir = 's3://<------------>/', catalog_impl = 'org.apache.iceberg.aws.glue.GlueCatalog' );

    error: line 1:8: mismatched input 'CATALOG'. Expecting: 'MATERIALIZED', 'MULTI', 'OR', 'PROTECTED', 'ROLE', 'SCHEMA', 'TABLE', 'VIEW'

  • geez even aws ai fails - there is no create catalog command in athena as far as I can tell

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.