Partition Projection with Date in S3 Bucket Hive Format


Say I have a s3 bucket with json files in the following format:s3://bucket_name/logs/region={region}/year={year}/month={month}/day={day}/hour={hour}/

I would like to use partition projection on region (as an enum) and year/month/day/hour (as a date).

If my s3 bucket was organized like:s3://bucket_name/logs/region={region}/yyyy/MM/dd/HH I believe I could create a table with partitions like this:

 "projection.enabled" = "true",
 "projection.region.type" = "enum",
 "projection.region.values" = "east,west,north,south",
 "projection.datehour.type" = "date",
 "projection.datehour.format" = "yyyy/MM/dd/HH",
 "projection.datehour.range" = "2021/01/01/00,NOW",
 "projection.datehour.interval" = "1",
 "projection.datehour.interval.unit" = "HOURS",
 "storage.location.template" = "s3://bucket_name/logs/region={region}/${datehour}/"

However, could I use the date type with the s3 format which uses the hive partitions?

AKA could you share an example of a Create Table query that enables partition projection on an s3 bucket that has the format:s3://bucket_name/logs/region={region}/year={year}/month={month}/day={day}/hour={hour}/? (where region is an enum, and the rest of the partitions are a date)


asked 10 months ago712 views
1 Answer

Hello, you can create the table with the hive style partitions for the data coming from Amazon Kinesis Data Firehose. Please use a similar query like below in your Create Table query to create the table for the hive style partitioned source data:

'projection.enabled' = 'true',
'projection.region.type' = 'enum',
'projection.region.values' = 'east,west,north,south',
'projection.datehour.format'= '\'year=\'yyyy/\'month=\'MM/\'day=\'dd/\'hour=\'HH',
'storage.location.template'='s3://bucket_name/logs/region={region}/${datehour}/' )

Once the table is created successfully, try to query the data inside the table using a simple SELECT query to confirm whether you can retrieve the data from the table or not.

If you encounter any issues while creating the table, I would request you to raise a support case with the AWS Premium Support Team for more specific troubleshooting.

answered 9 months ago
  • 'projection.datehour.range'='year=2023/month=08/day=01/hour=00,NOW',

    I believe the datehour range needs to be like this?

    I couldn't get it to work like: 'projection.datehour.range'='2021/01/01/00,NOW',

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions