Can a single Glue streaming job handle multiple kinesis sources

0

In a Glue streaming job, is it possible read from Multiple Kinesis sources in my spark script?

something like:

streams = ["streamA", "streamB"]

for stream in streams:
    process_stream(stream)

def process_stream(stream_name):
    glueContext.forEachBatch(
            frame=dataframe,
            batch_function=process_batch_with_stream_name,
            options={
                "windowSize": "60 seconds",
                "checkpointLocation": args["TempDir"] + f"/job_{JOB_NAME}/" 
            }
YK
질문됨 5달 전622회 조회
3개 답변
0
수락된 답변

Hello,

To read multiple kinesis sources you can create a DataFrame for each stream and use a union function before passing it to forEachBatch. If you want to process the data separately on the same job, separate threads should be coordinated which is complex to implement and hence it is not recommended.

You can also refer to the following documentation for more guidance on Streaming ETL jobs in AWS Glue: https://docs.aws.amazon.com/glue/latest/dg/add-job-streaming.html

If you need specific guidance for your use-case, please open a support case with AWS using the following link: https://console.aws.amazon.com/support/home#/case/create

AWS
지원 엔지니어
답변함 5달 전
0

Thanks! I ended up using separate thread for each stream. Why is it not recommended?

YK
답변함 5달 전
  • They could have interference (e.g. fighting for driver memory) and in general much harder to monitor and operate (e.g., what happens if one of them fails, do you restart the whole job?)

0

Yes, you just need to create a DataFrame for each stream and union() them before passing it to forEachBatch.
Notice that assumes your function can process data coming from either of them.
If you mean processing them in separately on the same job, that requires calling forEachBatch on separate threads and coordinating them, it's much more complex to operate and not recommended.

profile pictureAWS
전문가
답변함 5달 전
profile picture
전문가
검토됨 5달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인