Extracting a SQL Server table to the data catalog in a job, fine, two tables, madness?

0

I have used the Glue Job editor to create a simple job that has a SQL Server DB as a source, does a filter by a column (SQL Query) and outputs it into an S3 bucket so I can use Athena to query. It works perfectly.

Now I wanted the same job to do the same with a number of other tables, so I edited the code just duplicating the block that starts at the "job = Job(glueContext)" line but no matter how I do it, the two tables are created and loaded weirdly, e.g. there should be 3 records on one and 2 on the other, they both end up with like 20 records each, with blank values for most of the rows.

What am I doing wrong? How else can I achieve this purpose? I thought of having crawlers for getting the schema and adding it into the data catalog first, but I create one simple crawler and it just spins and spins and fails with "Internal Service Exception". Not sure how else I can achieve this. Thanks for any insights.

gefragt vor einem Jahr498 Aufrufe
1 Antwort
0
Akzeptierte Antwort

I discovered something that is probably obvious to everyone but wasn't for me: Athena queries all the files in a folder as if they are part of the same table, i.e. I have to have each table in a separate folder. Duh.

beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen