Athena date partition projection

0

I just can't understand what I'm doing wrong.

I have a table.

CREATE EXTERNAL TABLE test (
  originalrequest string,
  requeststarted string
)
PARTITIONED BY ( 
  req_start_partition string
)
STORED AS PARQUET
LOCATION
  's3://my_bucket/my_dir'
TBLPROPERTIES (

  'projection.enabled'='true',
  'projection.req_start_partition.type'='date',
  'projection.req_start_partition.range'='NOW-20DAYS,NOW+2DAYS',
  'projection.req_start_partition.interval' = '1',
  'projection.req_start_partition.interval.unit'='HOURS',
  'projection.req_start_partition.format' = "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
 )
 

I also have a data with ISO8601 timestamp which I'd love to insert into this table. The data should be partitioned by rounded hour. But whatever I do, I have the same error

  
 insert into  test 
 values( 
  'some data',
  '2024-07-19T17:27:23.348Z',
  '2024-07-19T17:27:23.348Z'
 )
  ------------------------------------

GENERIC_INTERNAL_ERROR: No partition found with values [2024-07-19T17:27:23.348Z] (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null). 
profile picture
Smotrov
asked 3 months ago370 views
2 Answers
1
Accepted Answer

Self answer. This works well

CREATE EXTERNAL TABLE test (
  originalrequest string,
  requeststarted string
)
PARTITIONED BY ( 
  req_start_partition TIMESTAMP 
)
STORED AS PARQUET
LOCATION
 's3://my_bucket/my_dir'
TBLPROPERTIES (

  'projection.enabled'='true',
  'projection.req_start_partition.type'='date',
  'projection.req_start_partition.range'='NOW-20DAYS,NOW+2DAYS',
  'projection.req_start_partition.interval' = '1',
  'projection.req_start_partition.interval.unit'='HOURS',
  'projection.req_start_partition.format' = 'yyyy-MM-dd HH:\'00:00\''
 )
  
  -------------------------------
  

  
 insert into  test 
 values( 
  'some data',
  '2024-07-19T17:27:23.348Z',
   date_trunc('hour', from_iso8601_timestamp('2024-07-19T17:27:23.348Z'))
 )
  
profile picture
Smotrov
answered 3 months ago
profile picture
EXPERT
reviewed 3 months ago
0

As per the definition of the table properties - TBLPROPERTIES (

'projection.enabled'='true', 'projection.req_start_partition.type'='date', 'projection.req_start_partition.range'='NOW-20DAYS,NOW+2DAYS', 'projection.req_start_partition.interval' = '1', 'projection.req_start_partition.interval.unit'='HOURS', 'projection.req_start_partition.format' = "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" )

These properties indicate that the req_start_partition column is partitioned by date, and the partitions are created at an interval of 1 hour.

When you insert data into this table, Athena needs to determine which partition the data should be stored in based on the timestamp value. However, since the partitions are created at an hourly interval, Athena expects the partition value to be rounded to the nearest hour. By rounding the timestamp to the nearest hour using the date_trunc function, you could ensure that the partition value matches the hourly partition interval specified in the table properties. Could you try with inserting the data as below and see if that works? date_trunc('hour', cast('2024-07-19T17:27:23.348Z' as timestamp))

AWS
answered 3 months ago
  • Thank you for prompt reply! I did few adjustments.

    Changed type of the partitioning column to timestamp

    CREATE EXTERNAL TABLE test (
      originalrequest string,
      requeststarted string
    )
    PARTITIONED BY ( 
      req_start_partition timestamp
    )
    STORED AS PARQUET
    LOCATION
     's3://my_bucket/my_dir'
    TBLPROPERTIES (
    
      'projection.enabled'='true',
      'projection.req_start_partition.type'='date',
      'projection.req_start_partition.range'='NOW-20DAYS,NOW+2DAYS',
      'projection.req_start_partition.interval' = '1',
      'projection.req_start_partition.interval.unit'='HOURS',
      'projection.req_start_partition.format' = "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
     )
    

    Slightly changed the string

     insert into  test 
     values( 
      'some data',
      '2024-07-19T17:27:23.348Z',
      date_trunc('hour', cast('2024-07-19 17:27:23.348' as timestamp))
     )
    

    But at the end of the day same error but with rounded value :-)

    GENERIC_INTERNAL_ERROR: No partition found with values [2024-07-19 17:00:00] (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null).
    

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions