- Newest
- Most votes
- Most comments
The idea of a columnar-storage database like Redshift for OLAP is that you can have tables with many columns and still be able to efficiently query just a small subset of columns across many rows. So I'd say there's no need to reformat as you're suggesting. However I don't have broad enough experience in this area to say it's NEVER a good idea, you may have special requirements.
The CSV file columns should match table columns. From your example the second file seems like a pivot of the first file. Since the layout is different you can create a second staging table for the second file and after loading the data with COPY command you can use the PIVOT/UNPIVOT to transform into first layout.
If the two files are independent then you can leave it as two separate tables. When running queries and joining the two tables you can use the PIVOT/UNPIVOT to transform at query execution time.
Having data as key-value pairs gives you flexibility of a varying schema but when running queries you often need to transform into tabular column format so it is easy to join with other tables. If your originating system is providing data in key-value pairs then use a staging schema that matches the source file format which will give you fast performance for write queries i.e., data ingestion. Then apply your transformations in intermediate or target schema which will give you fast join performance for your read queries.
Relevant content
- Accepted Answerasked 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
I would experiment both ways. That'd be a fun exercise regardless! Plus would readability be a factor in making a decision on how you'd store the data?