Query in Athena partitioned data

0

Hello,
I am trying to query an s3 bucket which has data stored in folders organized by year, month, date and hour (s3://mybucket/2017/06/01/01) and has all the files in gz format. (I have replaced my bucket name and column name with dummy names)

I created table using the following query
CREATE EXTERNAL TABLE IF NOT EXISTS testdb.test_tables (
col1 string,
col2 string,
.....
) PARTITIONED BY (
year int,
month int,
date int,
hour int
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3://mybucket/'
TBLPROPERTIES ('has_encrypted_data'='false')

This query runs successfully.

After that I am running,
MSCK REPAIR TABLE test_tables

It returns values saying
Partitions not in metastore: test_tables:2017/05/14/00 test_tables:2017/05/14/01 test_tables:2017/05/14/02 test_tables:2017/05/14/03 test_tables:2017/05/14/04 test_tables:2017/05/14/05 test_tables:2017/05/14/06 test_tables:2017/05/14/07 test_tables:2017/05/14/08

This doesnt seem right.

And then when I run a basic query
show partitions test_tables.

It returns only "Query Successful" with nothing else.

Also when I run
select * from test_tables limit 10;

It returns nothing

asked 7 years ago5175 views
2 Answers
0

It is happening because the partitions are not created properly.
Partitioning can be done in two ways - Dynamic Partitioning and Static Partitioning.
For dynamic partitioning, your folder structure should be of the form:
s3://mybucket/year=2017/month=06/day=01/hour=01
In this case, when you'll run "MSCK REPAIR TABLE test_tables" query after creating table, the partitions will be identified automatically.
Since your folder structure is :
s3://mybucket/2017/06/01/01
You need to add partitions manually after creating table and before executing repair table query. This can be done using the following query:
ALTER TABLE test_tables ADD PARTITION (year='2017',month='06',day='01',hour='01') location 's3://mybucket/2017/06/01/01'

For more details refer -
http://docs.aws.amazon.com/athena/latest/ug/partitions.html

karu07
answered 7 years ago
0

Correct. The MSCK repair table only works if your prefixes on S3 are in a key=value format. Else you need to manually add partitions. Also, if you are in US-East-1 you can also use Glue to automatically recognize schemas/partitions. See http://docs.aws.amazon.com/athena/latest/ug/glue-faq.html

AWS
answered 7 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions