HOW DO I COMBINE MULTIPLE CSVS INTO ONE

0

I HAVE MULTIPLE CSVS ABOUT A SINGLE PATIENT AND I WOULD LIKE TO KNOW HOW DO I COMBINE ALL THE CSVS BECAUSE ALL THE COLUMNS INSIDE THE CSVS MAKE UP AN ALL THE INFORMATION FOR ONE PATIENT. THE CSV'S ARE STORED IN S3 BUCKET AND INDIFFERENT FOLDERS. i HAVE TRIED USING JOIN BUT BECAUSE WE HAVE MANY PATIENTS THE JOB IS TAKING FOREVER.TIA

CYN
gefragt vor 7 Monaten377 Aufrufe
1 Antwort
3

Hello,

You can create an athena table for taking the input locations as all the s3 prefix. Something like this, refer create table in athena

CREATE EXTERNAL TABLE `test_table`(
...
)
ROW FORMAT ...
STORED AS INPUTFORMAT ...
OUTPUTFORMAT ...
LOCATION s3://bucketname/folder/

Once create the table, use CTAS to create another table to consolidate all the csv as single table output location like below, refer here for CTAS

CREATE TABLE ctas_csv_unpartitioned 
WITH (
     format = 'CSV', 
     external_location = 's3://xxxxxxxxxxxx/ctas_csv_unpartitioned/') 
AS SELECT key1, name1, comment1
FROM test_table;
AWS
SUPPORT-TECHNIKER
beantwortet vor 7 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen