Saltar al contenido

How to create iceberg schema with partition?

0
    time timestamp,
    foo string,
)
PARTITIONED BY (foo string)
;

![Enter image description here](/media/postImages/original/IMWY4ibEdjTX6NgV8v-_R2kQ)
preguntada hace 3 meses133 visualizaciones
2 Respuestas
0

There are several ways to create an Iceberg table with partitioning, depending on which tool you're using:

Using Amazon Athena SQL:

CREATE TABLE iceberg_table (
time timestamp,
foo string
)
PARTITIONED BY (foo)
LOCATION 's3://your-bucket/your-path/'
TBLPROPERTIES ('table_type' = 'ICEBERG')

Note that when using the PARTITIONED BY clause in Athena, the columns used for partitioning must be specified in the column declarations first, and the column type should not be included in the PARTITIONED BY clause.

You can also use hidden partitioning with transforms in Athena:

CREATE TABLE iceberg_table (
time timestamp,
foo string
)
PARTITIONED BY (day(time), foo)
LOCATION 's3://your-bucket/your-path/'
TBLPROPERTIES ('table_type' = 'ICEBERG')

Using PyIceberg:

from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType, TimestampType
from pyiceberg.partitioning import PartitionSpec, PartitionField

# Define schema
schema = Schema(
NestedField(field_id=1, name="time", field_type=TimestampType(), required=True),
NestedField(field_id=2, name="foo", field_type=StringType(), required=True)
)

# Define partition spec
partition_spec = PartitionSpec(
PartitionField(
source_id=2,  # Refers to "foo"
field_id=1000,
transform=IdentityTransform(),
name="foo"
)
)

# Create table
glue_catalog.create_table(
identifier="database_name.table_name",
schema=schema,
location="s3://your-bucket/your-path/",
partition_spec=partition_spec
)

Using Apache Spark:

dataFrame.writeTo("catalog_name.database_name.table_name")
.tableProperty("format-version", "2")
.partitionedBy("foo")
.createOrReplace()

The error in your example is that you're including the column type "string" in the PARTITIONED BY clause, which is not allowed. The correct syntax should exclude the data type in the partition specification.
Sources
Create Iceberg tables - Amazon Athena
Getting started with Iceberg tables in Amazon Athena SQL - AWS Prescriptive Guidance
Working with Iceberg tables by using PyIceberg - AWS Prescriptive Guidance
Working with Iceberg tables by using Apache Spark - AWS Prescriptive Guidance

respondido hace 3 meses
0

The pyIceberg implementation is quite unfortunate. Having to define the schema first in order to sort and/or partition table is quite a sub optimal solution. I know it's good practice to first define the schema, but this is not possible for many source systems when dealing with 100's or 1000's of tables.

respondido hace 2 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.