Trying to surface daily csvs in S3 in Redshift via AWS Glue Studio but databases aren't showing up

0

I am trying to use the AWS Glue Studio to build a simple ETL workflow. Basically, I have a bunch of csv files in different directories in S3. I want those csvs to be accessible via a database and have chosen Redshift for the job. The directories and will be updated every day with new csv files. The file structure is:

YYYY-MM-DD (e.g. 2023-03-07) |---- groupName1 |---- groupName1.csv |---- groupName2 |---- groupName2.csv ... |---- groupNameN |---- groupNameN.csv

We will be keeping historical data, so every day I will have a new date-based directory.

I've read that AWS Glue can automatically copy data on a schedule but I can't see my Redshift databases or tables (screenshot below). I'm using my AWS admin account and I do have AWSGlueConsoleFullAccess permission (screenshot below)

Enter image description here

Enter image description here

1 Antwort
0

Those databases and tables are from the Glue Catalog, not Redshift.
The way it's intended to work is having a crawler map the Redshift tables to Catalog tables and they will be listed there for you to use.
Sorry for the inconvenience, the team is aware that this is something to improve.

profile pictureAWS
EXPERTE
beantwortet vor einem Jahr
  • So if I have hundreds of new .csv files every day in new directories in S3, what is a recommended approach to scalably load that data into Redshift tables? Also, what is the best way of creating those hundreds of Redshift tables to begin with?

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen