Query in Athena partitioned data

0

Hello,
I am trying to query an s3 bucket which has data stored in folders organized by year, month, date and hour (s3://mybucket/2017/06/01/01) and has all the files in gz format. (I have replaced my bucket name and column name with dummy names)

I created table using the following query
CREATE EXTERNAL TABLE IF NOT EXISTS testdb.test_tables (
col1 string,
col2 string,
.....
) PARTITIONED BY (
year int,
month int,
date int,
hour int
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3://mybucket/'
TBLPROPERTIES ('has_encrypted_data'='false')

This query runs successfully.

After that I am running,
MSCK REPAIR TABLE test_tables

It returns values saying
Partitions not in metastore: test_tables:2017/05/14/00 test_tables:2017/05/14/01 test_tables:2017/05/14/02 test_tables:2017/05/14/03 test_tables:2017/05/14/04 test_tables:2017/05/14/05 test_tables:2017/05/14/06 test_tables:2017/05/14/07 test_tables:2017/05/14/08

This doesnt seem right.

And then when I run a basic query
show partitions test_tables.

It returns only "Query Successful" with nothing else.

Also when I run
select * from test_tables limit 10;

It returns nothing

gefragt vor 7 Jahren5236 Aufrufe
2 Antworten
0

It is happening because the partitions are not created properly.
Partitioning can be done in two ways - Dynamic Partitioning and Static Partitioning.
For dynamic partitioning, your folder structure should be of the form:
s3://mybucket/year=2017/month=06/day=01/hour=01
In this case, when you'll run "MSCK REPAIR TABLE test_tables" query after creating table, the partitions will be identified automatically.
Since your folder structure is :
s3://mybucket/2017/06/01/01
You need to add partitions manually after creating table and before executing repair table query. This can be done using the following query:
ALTER TABLE test_tables ADD PARTITION (year='2017',month='06',day='01',hour='01') location 's3://mybucket/2017/06/01/01'

For more details refer -
http://docs.aws.amazon.com/athena/latest/ug/partitions.html

karu07
beantwortet vor 7 Jahren
0

Correct. The MSCK repair table only works if your prefixes on S3 are in a key=value format. Else you need to manually add partitions. Also, if you are in US-East-1 you can also use Glue to automatically recognize schemas/partitions. See http://docs.aws.amazon.com/athena/latest/ug/glue-faq.html

AWS
beantwortet vor 7 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen