Passer au contenu

How to create iceberg schema with partition?

0
    time timestamp,
    foo string,
)
PARTITIONED BY (foo string)
;

![Enter image description here](/media/postImages/original/IMWY4ibEdjTX6NgV8v-_R2kQ)
demandé il y a 3 mois143 vues
2 réponses
0

There are several ways to create an Iceberg table with partitioning, depending on which tool you're using:

Using Amazon Athena SQL:

CREATE TABLE iceberg_table (
time timestamp,
foo string
)
PARTITIONED BY (foo)
LOCATION 's3://your-bucket/your-path/'
TBLPROPERTIES ('table_type' = 'ICEBERG')

Note that when using the PARTITIONED BY clause in Athena, the columns used for partitioning must be specified in the column declarations first, and the column type should not be included in the PARTITIONED BY clause.

You can also use hidden partitioning with transforms in Athena:

CREATE TABLE iceberg_table (
time timestamp,
foo string
)
PARTITIONED BY (day(time), foo)
LOCATION 's3://your-bucket/your-path/'
TBLPROPERTIES ('table_type' = 'ICEBERG')

Using PyIceberg:

from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType, TimestampType
from pyiceberg.partitioning import PartitionSpec, PartitionField

# Define schema
schema = Schema(
NestedField(field_id=1, name="time", field_type=TimestampType(), required=True),
NestedField(field_id=2, name="foo", field_type=StringType(), required=True)
)

# Define partition spec
partition_spec = PartitionSpec(
PartitionField(
source_id=2,  # Refers to "foo"
field_id=1000,
transform=IdentityTransform(),
name="foo"
)
)

# Create table
glue_catalog.create_table(
identifier="database_name.table_name",
schema=schema,
location="s3://your-bucket/your-path/",
partition_spec=partition_spec
)

Using Apache Spark:

dataFrame.writeTo("catalog_name.database_name.table_name")
.tableProperty("format-version", "2")
.partitionedBy("foo")
.createOrReplace()

The error in your example is that you're including the column type "string" in the PARTITIONED BY clause, which is not allowed. The correct syntax should exclude the data type in the partition specification.
Sources
Create Iceberg tables - Amazon Athena
Getting started with Iceberg tables in Amazon Athena SQL - AWS Prescriptive Guidance
Working with Iceberg tables by using PyIceberg - AWS Prescriptive Guidance
Working with Iceberg tables by using Apache Spark - AWS Prescriptive Guidance

répondu il y a 3 mois
0

The pyIceberg implementation is quite unfortunate. Having to define the schema first in order to sort and/or partition table is quite a sub optimal solution. I know it's good practice to first define the schema, but this is not possible for many source systems when dealing with 100's or 1000's of tables.

répondu il y a 3 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.