Partition Project leading pad leading zeros on dates

0

My create table SQL looks like the below (table structure greatly simplified). I'm using partition projection.

I am trying to figure out how to handle date strings. For easier SQL querying purposes, I want the dates to have leading zeros, eg "2022/09/06". However the S3 buckets unfortunately are using non padded zero, eg, "2022/9/6".

Is there a way to alter the below query to accomplish this? If I simply change the line 'projection.dt.format'='yyyy/M/d', to 'projection.dt.format'='yyyy/MM/dd', Then the table partitions don't pick up a date like "2022/9/6".

 CREATE EXTERNAL TABLE IF NOT EXISTS TableName (
  userkey string,
  id string,
  createdon string,
  siteid int,
 --snipped---
)
PARTITIONED BY ( 
  `dt` string)
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'ignore.malformed.json'='true') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://company-log-split/{{NetworkIdHandler.value}}/company-log'
TBLPROPERTIES (
  'projection.enabled'='true', 
  'projection.dt.format'='yyyy/M/d', 
  'projection.dt.interval'='1', 
  'projection.dt.interval.unit'='DAYS', 
  'projection.dt.range'='NOW - 3 MONTHS,NOW + 5 YEARS', 
  'projection.dt.type'='date',
  'storage.location.template'='s3://company-log-split/{{NetworkIdHandler.value}}/company-log/${dt}'
) 
asked 2 years ago253 views
2 Answers
0
Accepted Answer

Hello,

As you have correctly pointed out, Athena query can only pick up the underlying date partitions and data when the date partition format in 'storage.location.template' matches the actual path in the S3 bucket. Because the '${dt}' path format in 'storage.location.template' depends on the date format defined in 'projection.dt.format', it is not possible to define or map the zero padded format to non zero padded format with Athena Partition Projection.

A possible workaround is using conventional Hive partitions instead of Partition Projection, for example, after created the table without Partition Projection, you can use ALTER TABLE ADD PARTITION query to add and map date partitions to the s3 path manually or programmatically:

ALTER TABLE my_table_name ADD PARTITION (dt = '2022/09/06') LOCATION 's3://mystorage/path/to/2022/9/6/';

To add partitions in batch programmatically, you can construct the Athena queries using a custom python script and execute them via start_query_execution API. Reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.start_query_execution

AWS
Ethan_H
answered 2 years ago
0

Thanks very much for the answer. I was afraid of that. We switched from the old "ADD PARTITION" method to "PARTITION PROJECTION" so as to make things easier, but in the case of leading zeros, it didn't. :(

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions