Extracting a SQL Server table to the data catalog in a job, fine, two tables, madness?

0

I have used the Glue Job editor to create a simple job that has a SQL Server DB as a source, does a filter by a column (SQL Query) and outputs it into an S3 bucket so I can use Athena to query. It works perfectly.

Now I wanted the same job to do the same with a number of other tables, so I edited the code just duplicating the block that starts at the "job = Job(glueContext)" line but no matter how I do it, the two tables are created and loaded weirdly, e.g. there should be 3 records on one and 2 on the other, they both end up with like 20 records each, with blank values for most of the rows.

What am I doing wrong? How else can I achieve this purpose? I thought of having crawlers for getting the schema and adding it into the data catalog first, but I create one simple crawler and it just spins and spins and fails with "Internal Service Exception". Not sure how else I can achieve this. Thanks for any insights.

1개 답변
0
수락된 답변

I discovered something that is probably obvious to everyone but wasn't for me: Athena queries all the files in a folder as if they are part of the same table, i.e. I have to have each table in a separate folder. Duh.

답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠