HOW DO I COMBINE MULTIPLE CSVS INTO ONE

0

I HAVE MULTIPLE CSVS ABOUT A SINGLE PATIENT AND I WOULD LIKE TO KNOW HOW DO I COMBINE ALL THE CSVS BECAUSE ALL THE COLUMNS INSIDE THE CSVS MAKE UP AN ALL THE INFORMATION FOR ONE PATIENT. THE CSV'S ARE STORED IN S3 BUCKET AND INDIFFERENT FOLDERS. i HAVE TRIED USING JOIN BUT BECAUSE WE HAVE MANY PATIENTS THE JOB IS TAKING FOREVER.TIA

CYN
feita há 7 meses377 visualizações
1 Resposta
3

Hello,

You can create an athena table for taking the input locations as all the s3 prefix. Something like this, refer create table in athena

CREATE EXTERNAL TABLE `test_table`(
...
)
ROW FORMAT ...
STORED AS INPUTFORMAT ...
OUTPUTFORMAT ...
LOCATION s3://bucketname/folder/

Once create the table, use CTAS to create another table to consolidate all the csv as single table output location like below, refer here for CTAS

CREATE TABLE ctas_csv_unpartitioned 
WITH (
     format = 'CSV', 
     external_location = 's3://xxxxxxxxxxxx/ctas_csv_unpartitioned/') 
AS SELECT key1, name1, comment1
FROM test_table;
AWS
ENGENHEIRO DE SUPORTE
respondido há 7 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas