Extracting a SQL Server table to the data catalog in a job, fine, two tables, madness?

0

I have used the Glue Job editor to create a simple job that has a SQL Server DB as a source, does a filter by a column (SQL Query) and outputs it into an S3 bucket so I can use Athena to query. It works perfectly.

Now I wanted the same job to do the same with a number of other tables, so I edited the code just duplicating the block that starts at the "job = Job(glueContext)" line but no matter how I do it, the two tables are created and loaded weirdly, e.g. there should be 3 records on one and 2 on the other, they both end up with like 20 records each, with blank values for most of the rows.

What am I doing wrong? How else can I achieve this purpose? I thought of having crawlers for getting the schema and adding it into the data catalog first, but I create one simple crawler and it just spins and spins and fails with "Internal Service Exception". Not sure how else I can achieve this. Thanks for any insights.

已提问 1 年前444 查看次数
1 回答
0
已接受的回答

I discovered something that is probably obvious to everyone but wasn't for me: Athena queries all the files in a folder as if they are part of the same table, i.e. I have to have each table in a separate folder. Duh.

已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则