Salta al contenuto

Issue with AWS Glue Iceberg REST catalog when evolving schema with StructType (field ID parsing error)

1

Hi, I’m running into an issue when using PyIceberg with AWS Glue Iceberg REST catalog during schema evolution.

Environment:

  • PyIceberg: latest (with pyarrow)
  • Catalog: AWS Glue REST (setup shared below)
  • Python: 3.12.x
  • OS: macOS

My AWS Glue REST API catalog setup:

catalog = load_catalog(
    catalog_name,
    **{
        "type": "rest",
        "warehouse": f"{account_id}:s3tablescatalog/{data_lake_s3_bucket}",
        "uri": f"https://glue.{aws_region}.amazonaws.com/iceberg",
        "rest.sigv4-enabled": "true",
        "rest.signing-name": "glue",
        "rest.signing-region": aws_region,
    },
)

Problem

When I try to add a new column of type StructType to an existing Iceberg table using update_schema(), Glue returns an error:

pyiceberg.exceptions.BadRequestError: InvalidInputException: Cannot parse to an integer value: id: 5.0

Full traceback:

Traceback (most recent call last):
  File "/Users/mukul/Documents/extra/iceberg/iceberg.py", line 318, in <module>
    with table.update_schema() as updater:
         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/pyiceberg/table/update/__init__.py", line 76, in __exit__
    self.commit()
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/pyiceberg/table/update/__init__.py", line 72, in commit
    self._transaction._apply(*self._commit())
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 295, in _apply
    self.commit_transaction()
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 936, in commit_transaction
    self._table._do_commit(  # pylint: disable=W0212
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 1458, in _do_commit
    response = self.catalog.commit_table(self, requirements, updates)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/tenacity/__init__.py", line 338, in wrapped_f
    return copy(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/tenacity/__init__.py", line 477, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/tenacity/__init__.py", line 378, in iter
    result = action(retry_state=retry_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/tenacity/__init__.py", line 400, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.10_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.10_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/tenacity/__init__.py", line 480, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/pyiceberg/catalog/rest/__init__.py", line 722, in commit_table
    _handle_non_200_response(
  File "/Users/mukul/Documents/extra/iceberg/venv/lib/python3.12/site-packages/pyiceberg/catalog/rest/response.py", line 111, in _handle_non_200_response
    raise exception(response) from exc
pyiceberg.exceptions.BadRequestError: InvalidInputException: Cannot parse to an integer value: id: 5.0

Steps to reproduce

  1. Create a table:
schema = Schema(
    NestedField(1, "id", StringType(), required=True),
    NestedField(2, "name", StringType(), required=False),
    NestedField(3, "roll_number", IntegerType(), required=True),
)

table = catalog.create_table(
    identifier="mydb.test_table",
    schema=schema,
)
  1. Attempt to evolve schema:
table = catalog.load_table("mydb.test_table")
with table.update_schema() as updater:
    updater.add_column(
        path="address",
        field_type=StructType(
            NestedField(4, "street", StringType(), required=False),
            NestedField(5, "city", StringType(), required=False),
            NestedField(6, "state", StringType(), required=False),
            NestedField(7, "zip", IntegerType(), required=False),
        ),
        required=False,
    )
  1. Observe error above.

Additional observations/information

  • Creating the table with the StructType initially works fine.
  • The error only occurs during schema evolution via update_schema().
  • Adding a simple integer column works, but adding or modifying struct fields fails.
  • It looks like Glue is rejecting a float value for a field ID (5.0) instead of integer.

Question

  • Is this a known limitation or bug in AWS Glue’s Iceberg REST support?
    • I remember this functionality working up till last Thursday (18th September 2025)
  • Is there a workaround to allow adding StructType fields via schema evolution?
  • Should this be raised as a bug with the Glue team?
posta 2 mesi fa123 visualizzazioni
3 Risposte
0

Based on the error message and your description, it appears you're encountering an issue with AWS Glue's Iceberg REST catalog when evolving a schema with a StructType field. The specific error "Cannot parse to an integer value: id: 5.0" suggests there's a problem with how field IDs are being handled during schema evolution.

Looking at the AWS Glue Iceberg implementation, field IDs are critical components in Iceberg table schemas. The IcebergStructField requires an integer ID that uniquely identifies each field within the schema, which is used for schema evolution and field tracking. Similarly, the IcebergSchema contains fields that are defined as arrays of IcebergStructField objects.

The error indicates that somewhere in the process, a field ID that should be an integer (in your case, likely field ID 5) is being interpreted or transmitted as a floating-point number (5.0), causing the parsing error.

This does appear to be a bug in how AWS Glue's Iceberg REST catalog is handling schema evolution with struct types. Since you mentioned this was working previously (until September 18th), it's possible that a recent change or update to the service introduced this regression.

As for workarounds:

  1. You could try creating a new table with the complete schema instead of evolving an existing one
  2. If possible, add the struct fields one at a time rather than all at once
  3. Try using a different client library temporarily if available

I would recommend raising this as a bug with the AWS Glue team, as it appears to be an issue with how the Glue Iceberg REST catalog is handling field IDs during schema evolution specifically for struct types. Include the error message, your code, and the fact that this functionality was working previously to help them diagnose the issue.
Sources
IcebergStructField - AWS Glue
IcebergSchema - AWS Glue
UpdateIcebergTableInput - AWS Glue

Hello,

As a workaround did u try using alter table command in spark env to check if it’s working fine?

https://iceberg.apache.org/docs/latest/spark-ddl/

Also as you have mentioned, it was working fine till last week, It would be worth checking with AWS support team where you can include the details like working behavior before Thursday, error message, PyIceberg version being used, rest catalog configs etc and they’ll have the resources to investigate further and get the rca and solution

con risposta 2 mesi fa
AWS
TECNICO DI SUPPORTO
revisionato 2 mesi fa
    • You could try creating a new table with the complete schema instead of evolving an existing one Not possible since our requirements require constant schema evolution and its not feasible to recreate tables and migrate data everytime

    • If possible, add the struct fields one at a time rather than all at once Does not work

    • Try using a different client library temporarily if available While using a different catalog is a completely valid option, we have already filtered them for our use case and AWS Glue is our most suited option so we would like to stick to it

0

Hello,

As a workaround did u try using alter table command in spark env to check if it’s working fine?

https://iceberg.apache.org/docs/latest/spark-ddl/

Also as you have mentioned, it was working fine till last week, It would be worth checking with AWS support team where you can include the details like working behavior before Thursday, error message, PyIceberg version being used, rest catalog configs etc and they’ll have the resources to investigate further and get the rca and solution

AWS
TECNICO DI SUPPORTO
con risposta 2 mesi fa
0

We are experiencing the same issue while evolving the schema of a table that contains an existing map column.

Failed to add column: Malformed request: Cannot parse to an integer value: element-id: 3.0
org.apache.iceberg.exceptions.BadRequestException: Malformed request: Cannot parse to an integer value: element-id: 3.0
con risposta 21 giorni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.