내용으로 건너뛰기

How to create iceberg schema with partition?

0
    time timestamp,
    foo string,
)
PARTITIONED BY (foo string)
;

![Enter image description here](/media/postImages/original/IMWY4ibEdjTX6NgV8v-_R2kQ)
질문됨 3달 전143회 조회
2개 답변
0

There are several ways to create an Iceberg table with partitioning, depending on which tool you're using:

Using Amazon Athena SQL:

CREATE TABLE iceberg_table (
time timestamp,
foo string
)
PARTITIONED BY (foo)
LOCATION 's3://your-bucket/your-path/'
TBLPROPERTIES ('table_type' = 'ICEBERG')

Note that when using the PARTITIONED BY clause in Athena, the columns used for partitioning must be specified in the column declarations first, and the column type should not be included in the PARTITIONED BY clause.

You can also use hidden partitioning with transforms in Athena:

CREATE TABLE iceberg_table (
time timestamp,
foo string
)
PARTITIONED BY (day(time), foo)
LOCATION 's3://your-bucket/your-path/'
TBLPROPERTIES ('table_type' = 'ICEBERG')

Using PyIceberg:

from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType, TimestampType
from pyiceberg.partitioning import PartitionSpec, PartitionField

# Define schema
schema = Schema(
NestedField(field_id=1, name="time", field_type=TimestampType(), required=True),
NestedField(field_id=2, name="foo", field_type=StringType(), required=True)
)

# Define partition spec
partition_spec = PartitionSpec(
PartitionField(
source_id=2,  # Refers to "foo"
field_id=1000,
transform=IdentityTransform(),
name="foo"
)
)

# Create table
glue_catalog.create_table(
identifier="database_name.table_name",
schema=schema,
location="s3://your-bucket/your-path/",
partition_spec=partition_spec
)

Using Apache Spark:

dataFrame.writeTo("catalog_name.database_name.table_name")
.tableProperty("format-version", "2")
.partitionedBy("foo")
.createOrReplace()

The error in your example is that you're including the column type "string" in the PARTITIONED BY clause, which is not allowed. The correct syntax should exclude the data type in the partition specification.
Sources
Create Iceberg tables - Amazon Athena
Getting started with Iceberg tables in Amazon Athena SQL - AWS Prescriptive Guidance
Working with Iceberg tables by using PyIceberg - AWS Prescriptive Guidance
Working with Iceberg tables by using Apache Spark - AWS Prescriptive Guidance

답변함 3달 전
0

The pyIceberg implementation is quite unfortunate. Having to define the schema first in order to sort and/or partition table is quite a sub optimal solution. I know it's good practice to first define the schema, but this is not possible for many source systems when dealing with 100's or 1000's of tables.

답변함 3달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.