Skip to content

Why are their random characters in my s3 path for Kinesis Data Firehouse to Apache Iceberg in S3?

0

am getting an unexpected folder on S3 between my data/ and my first partition root_account_uuid=x The path appears to be unique for each time firehose sync's its buffered data to S3. I am using a lambda to control which events are written, as well as providing the table_name so it can be picked up by firehose using JQ syntax. The lambda also adds an object with the 3 partition keys I configured the table with when I created it in Athena. I can provide details in a thread. I have not been able to find anything about this extra bit of the file path in my web searches and on aws documentation for firehouse -> Iceberg.

1 Answer
0

Currently, Amazon Kinesis Data Firehose does not natively support direct streaming to Apache Iceberg table formats in S3. The behavior you're describing is not a standard feature of Kinesis Data Firehose when writing to Iceberg tables.

Kinesis Data Firehose can output data to Amazon S3 in formats like Parquet, but it doesn't directly support writing to Iceberg table formats. The random characters you're seeing in the S3 path are likely part of Kinesis Data Firehose's standard output pattern when writing to S3, not specifically related to Iceberg.

If you're trying to create an Iceberg table structure in S3, you would typically need to set up a separate process to convert the data from the Kinesis Data Firehose output format into the Iceberg table format. This might involve using AWS Glue or another ETL process to transform the data after it's been written to S3 by Kinesis Data Firehose.

The unexpected folder you're seeing between 'data/' and your first partition is likely part of Kinesis Data Firehose's default prefix structure for S3 objects. Firehose uses a combination of delivery stream name, date, and time to create unique prefixes for each batch of data it writes to S3.

If you're trying to create a specific folder structure for use with Iceberg tables, you might need to implement a custom solution. This could involve using AWS Lambda to process the data after it's written by Firehose, or setting up a separate ETL job to reorganize the data into the desired Iceberg table structure.

For a more tailored solution to your specific use case with Iceberg tables, you might want to consider consulting with AWS support or exploring alternative architectures that are better suited for working with Iceberg table formats in S3.
Sources
Does Kinesis Firehose destination support Glue table format of type Iceberg | AWS re:Post
Amazon Kinesis Data Firehose Features - Streaming Data Pipeline - Amazon Web Services

answered a year ago
AWS
SUPPORT ENGINEER
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.