Query in Athena partitioned data

0

Hello,
I am trying to query an s3 bucket which has data stored in folders organized by year, month, date and hour (s3://mybucket/2017/06/01/01) and has all the files in gz format. (I have replaced my bucket name and column name with dummy names)

I created table using the following query
CREATE EXTERNAL TABLE IF NOT EXISTS testdb.test_tables (
col1 string,
col2 string,
.....
) PARTITIONED BY (
year int,
month int,
date int,
hour int
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3://mybucket/'
TBLPROPERTIES ('has_encrypted_data'='false')

This query runs successfully.

After that I am running,
MSCK REPAIR TABLE test_tables

It returns values saying
Partitions not in metastore: test_tables:2017/05/14/00 test_tables:2017/05/14/01 test_tables:2017/05/14/02 test_tables:2017/05/14/03 test_tables:2017/05/14/04 test_tables:2017/05/14/05 test_tables:2017/05/14/06 test_tables:2017/05/14/07 test_tables:2017/05/14/08

This doesnt seem right.

And then when I run a basic query
show partitions test_tables.

It returns only "Query Successful" with nothing else.

Also when I run
select * from test_tables limit 10;

It returns nothing

질문됨 7년 전5236회 조회
2개 답변
0

It is happening because the partitions are not created properly.
Partitioning can be done in two ways - Dynamic Partitioning and Static Partitioning.
For dynamic partitioning, your folder structure should be of the form:
s3://mybucket/year=2017/month=06/day=01/hour=01
In this case, when you'll run "MSCK REPAIR TABLE test_tables" query after creating table, the partitions will be identified automatically.
Since your folder structure is :
s3://mybucket/2017/06/01/01
You need to add partitions manually after creating table and before executing repair table query. This can be done using the following query:
ALTER TABLE test_tables ADD PARTITION (year='2017',month='06',day='01',hour='01') location 's3://mybucket/2017/06/01/01'

For more details refer -
http://docs.aws.amazon.com/athena/latest/ug/partitions.html

karu07
답변함 7년 전
0

Correct. The MSCK repair table only works if your prefixes on S3 are in a key=value format. Else you need to manually add partitions. Also, if you are in US-East-1 you can also use Glue to automatically recognize schemas/partitions. See http://docs.aws.amazon.com/athena/latest/ug/glue-faq.html

AWS
답변함 7년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠