How do I troubleshoot partition projection issues in Amazon Athena?

6 minute read
0

I want to troubleshoot partition projection issues in Amazon Athena.

Short description

Issues with partition projection might be related to matching the storage.location.template with the Amazon Simple Storage Service (Amazon S3) directory structure. If the column is defined as a partition column, then it requires a location in the storage.location.template and a projection configuration for the column. When you create a storage.location.template, the partition values are projected into the template to pull the Amazon S3 paths for data. If the template doesn't match your directory or your projection configurations are missing, then issues appear.

Troubleshoot your issues with partition projection in Athena based on the following scenarios:

  • Your Athena table has set up partition projection, but when you query the table there are no results.
  • Your Athena table has set up partition projection, but your queries fail with the error INVALID_TABLE_PROPERTY: The 'storage.location.template' table property must contain templates for all partition columns!
  • Your Athena table has set up partition projection, but your queries fail with the error INVALID_TABLE_PROPERTY: Table (table-name) is configured for partition projection, but the following partition columns are missing projection configuration: (partition-column-name)
  • Your Athena table has Hive style partitioning, but you want to use partition projection to query your table instead of loading the partitions.

Resolution

Your Athena table has set up partition projection, but there are no results when you query the table

To troubleshoot this scenario, take the following actions:

Set up partition projection and check your table parameters. Make sure that projection.enabled is true. For each partition column in the table, set the projection.columnName.type to enum, integer, date, or injected:

  • If the projection type is date, then set the column type to String. Use String as the date type for partition keys. For more information, see Data types in Amazon Athena.
  • If the projection type is date, then use projection.columnName.range and projection.columnName.format. Your projection.columnName.format must correspond to how the data is configured on the Amazon S3 path. Also, use projection.columnName.interval and projection.columnName.range as required. For more information, see Date type.
  • If the projection type is enum, then set projection.columnName.values to a list of all potential values for that partition column.
  • If the projection type is integer, then set projection.columnName.range to a range of potential values for that column. Also, use projection.columnName.interval and projection.columnName.digits as required. For more information, see Integer type.
  • For columns with injected type, make sure that your queries have a filter expression such as WHERE <injected_col> for each column. Also, use the String column type in the partition column definition for injected columns.

Check that your storage.location.template has a placeholder for every partition column and matches the Amazon S3 directory structure of your bucket. For example, AWS CloudTrail logs can use partition projection.

Note: By default, your AWS Region and account ID are not projected. For more information, see Creating the table for CloudTrail logs in Athena using partition projection.

To project the Region and account ID with CloudTrail logs, use the following template. Use the enum type and make sure that you enumerate all your Regions and account ID.

Note: Replace example-bucket with your required Amazon S3 bucket. Your region, timestamp, and account fields are projected:

'storage.location.template'='s3://<example-bucket>/AWSLogs/${account}/CloudTrail/${region}/${timestamp}'

To query your Athena table to locate data in a specific Amazon S3 path, run the following commands:

Note: Replace example-YYYY/MM/DD with your desired timestamp, example-account-ID with your required account ID, and example-region with the required Region.

SELECT * FROM cloudtrail_logs_pp WHERE timestamp='YYYY/MM/DD' AND account=example-account-ID AND region=example-region

Example partition:

s3://bucket/AWSLogs/example-account-ID/CloudTrail/example-region/YYYY/MM/DD

Your Athena table has set up partition projection, but your queries fail with the error INVALID_TABLE_PROPERTY: The 'storage.location.template' table property must contain templates for all partition columns!

If you receive this error, then the storage.location.template is missing a partition column placeholder. To resolve, make sure that each column in the PARTITIONED BY clause has a placeholder.

Your Athena table has set up partition projection, but your queries fail with the error INVALID_TABLE_PROPERTY: Table (table-name) is configured for partition projection, but the following partition columns are missing projection configuration: (partition-column-name)

If you receive this error, then the column defined by the PARTITIONED BY clause is missing the required table parameters for projection such as projection.columnName.format. To resolve this, check your table parameters as follows:

  • If the projection type is date, then use projection.columnName.range and projection.columnName.format. Your projection.columnName.format must correspond to how the data is configured on the Amazon S3 path. Also, use projection.columnName.interval and projection.columnName.range as required. For more information, see Date type.
  • If the projection type is enum, then set projection.columnName.values to a list of all potential values for that partition column.
  • If the projection type is integer, then set projection.columnName.range to a range of potential values for that column. Also, use projection.columnName.interval and projection.columnName.digits as required. For more information, see Integer type.
  • For columns with injected type, make sure that your queries have a filter expression for each injected column. Also, use the String column type in the PARTITIONED BY clause for injected columns.

Your Athena table has Hive style partitioning, but you want to use partition projection to query your table instead of loading the partitions.

If your Athena table has Hive style partitioning, but you want to use partition projection, then check the storage template and projection formats. The storage template and projection formats must include the partition_col_name= portion of the Amazon S3 path. Run the following command:

Note: Replace example-table-name with the name of your required table.

CREATE EXTERNAL TABLE <example-table-name> (
col1 int, 
col2 int
)
PARTITIONED BY
partition_col_name STRING
)
ROW FORMAT DELIMITED
LOCATION
's3://bucket/prefix/'
TBLPROPERTIES (
  'projection.enabled'='true',
  'projection.partition_col_name.type'='<enum,date,etc>',
  ... ##Omitting type-based projection parameters
  'storage.location.template'='s3://bucket/prefix/partition_col_name={partition_col_name}/')

If you have multiple Hive style partitions and you want to map them to a single projected partition column, then alter the column format. To query your Amazon S3 data to have a single partition column called date, run the following command:

Note: Replace example-table-name with the name of your required table.

CREATE EXTERNAL TABLE <example-table-name> (col1 int, col2 int)
PARTITIONED BY (
`date` STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION "s3://bucket/prefix/"
TBLPROPERTIES (
  'projection.enabled'='true',
  'projection.date.type'='date',
  'projection.date.format'='\'year=\'yyyy/\'month=\'MM/\'day=\'dd',
  'projection.date.range'='year=2023/month=01/day=01,year=2023/month=12/day=31',
  'storage.location.template' = 's3://bucket/prefix/${date}/')

Example Amazon S3 path:

s3://bucket/prefix/year=/month=/day=/
AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago