Athena partition projection BUG

0

Spend couple of hours before understand that there is a bug in AWS Athena.

Here is how to recreate it.

CREATE EXTERNAL TABLE test (
  a string)
PARTITIONED BY ( 
  month int
)
STORED AS PARQUET
LOCATION
  's3://my_bucket/data_lake/test'
TBLPROPERTIES (

  'projection.enabled'='true', 

  
  'projection.month.type'='integer', 
  'projection.month.digits'='2', 
  'projection.month.range'='01,12'
 )
  
  -------------------------------
  
 insert into  test 
 values ('some data',1)

This insert statement returns an error

GENERIC_INTERNAL_ERROR: No partition found with values [1] (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null).

All the case is in this line 'projection.month.digits'='2', despite the fact that month is an integer it does not allow to insert integer into this field.

Meanwhile it is stated in the documentation

Optional. A positive integer that specifies the number of digits to include in the partition value's **final** representation for column columnName

But the same time is affects not only final representation but does not allows to insert anything into the field.

profile picture
Smotrov
asked 3 months ago295 views
2 Answers
0

What have you configured for storage.location.template in your TBLPROPERTIES?

EXPERT
Leo K
answered 3 months ago
  • I didn't set any. Because it is stated in the documentation

    A custom template enables Athena to properly map partition values to custom Amazon S3 file locations that do not follow a typical .../column=value/... pattern.

    and my case does supposed to follow typical.

    Anyway this does not changes anything.

    CREATE EXTERNAL TABLE test (
      a string)
    PARTITIONED BY ( 
      month int
    )
    STORED AS PARQUET
    LOCATION
      's3://my_bucket/data_lake/test'
    TBLPROPERTIES (
    
      'projection.enabled'='true', 
    
      
      'projection.month.type'='integer', 
      'projection.month.digits'='2', 
      'projection.month.range'='01,12',
      'storage.location.template' =  's3://my_bucket/data_lake/test/month=${month}/'
     )
    
0

When using partition projection it generates all possible values for the partitioning column. In this case it generates values 01, 02, 03,..10,11,12. Input integer 1 doesn't match with '01' hence the error.

To have leading zeros in the value it has to be defined as string in table definition - this is irrespective of whether you are using partition projection or not.

Here if you are looking for two digits for 'month' partitioning column values, then define it as string -

CREATE EXTERNAL TABLE test (
  a string)
PARTITIONED BY ( 
  month string
)
STORED AS PARQUET
LOCATION
  's3://my_bucket/data_lake/test'
TBLPROPERTIES (
  'projection.enabled'='true', 
  'projection.month.type'='integer', 
  'projection.month.digits'='2', 
  'projection.month.range'='01,12'
 )

INSERT INTO test VALUES ('some data','01')

This will create s3 partition like - month=01

Though it is defined as string in the table, it will not take values other than these - [01,02,03,04....10,11,12] and throw an error if any other integer or a string value is passed.

Moreover partition projections are mainly used when querying data that is highly partitioned[1].

[1] https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html#partition-projection-using

AWS
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions