跳至內容

How to create iceberg schema with partition?

0
    time timestamp,
    foo string,
)
PARTITIONED BY (foo string)
;

![Enter image description here](/media/postImages/original/IMWY4ibEdjTX6NgV8v-_R2kQ)
已提問 3 個月前檢視次數 123 次
2 個答案
0

There are several ways to create an Iceberg table with partitioning, depending on which tool you're using:

Using Amazon Athena SQL:

CREATE TABLE iceberg_table (
time timestamp,
foo string
)
PARTITIONED BY (foo)
LOCATION 's3://your-bucket/your-path/'
TBLPROPERTIES ('table_type' = 'ICEBERG')

Note that when using the PARTITIONED BY clause in Athena, the columns used for partitioning must be specified in the column declarations first, and the column type should not be included in the PARTITIONED BY clause.

You can also use hidden partitioning with transforms in Athena:

CREATE TABLE iceberg_table (
time timestamp,
foo string
)
PARTITIONED BY (day(time), foo)
LOCATION 's3://your-bucket/your-path/'
TBLPROPERTIES ('table_type' = 'ICEBERG')

Using PyIceberg:

from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType, TimestampType
from pyiceberg.partitioning import PartitionSpec, PartitionField

# Define schema
schema = Schema(
NestedField(field_id=1, name="time", field_type=TimestampType(), required=True),
NestedField(field_id=2, name="foo", field_type=StringType(), required=True)
)

# Define partition spec
partition_spec = PartitionSpec(
PartitionField(
source_id=2,  # Refers to "foo"
field_id=1000,
transform=IdentityTransform(),
name="foo"
)
)

# Create table
glue_catalog.create_table(
identifier="database_name.table_name",
schema=schema,
location="s3://your-bucket/your-path/",
partition_spec=partition_spec
)

Using Apache Spark:

dataFrame.writeTo("catalog_name.database_name.table_name")
.tableProperty("format-version", "2")
.partitionedBy("foo")
.createOrReplace()

The error in your example is that you're including the column type "string" in the PARTITIONED BY clause, which is not allowed. The correct syntax should exclude the data type in the partition specification.
Sources
Create Iceberg tables - Amazon Athena
Getting started with Iceberg tables in Amazon Athena SQL - AWS Prescriptive Guidance
Working with Iceberg tables by using PyIceberg - AWS Prescriptive Guidance
Working with Iceberg tables by using Apache Spark - AWS Prescriptive Guidance

已回答 3 個月前
0

The pyIceberg implementation is quite unfortunate. Having to define the schema first in order to sort and/or partition table is quite a sub optimal solution. I know it's good practice to first define the schema, but this is not possible for many source systems when dealing with 100's or 1000's of tables.

已回答 2 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。