Is it possible to dynamically change the capacity-upfront for EMR cluster using Data Pipeline?

Main problem

I understand that is no need to add Auto Scaling to an EMR cluster launched by Data Pipeline. Instead, we can specify the capacity up-front and it will be used for the duration of the job. But what if I am doing transformation on some data weekly basis and the instance type size needed do this every week, and I can't be sure about how many nodes are required in the cluster for better performance?

Possible solution?

At the moment, I can predict the amount of data that EMR could process thanks to the number of events tracked in a period of time on OpenSearch (where EMR will extract data), E.g. If 1 EMR node can handle 1,000 events and the actual number of events was 10,000 then create 10 nodes.

I though on create an EventBridge cron job to execute a Lambda function ~10 minutes before the Data pipeline process and calculate the number of nodes, then store the value in a service like SSM parameter store. So when the Data Pipeline starts, I can be able to retrieve the value and pass it as a parameter for the task.

This may sound a little complicated so I would like to know if there's maybe an easier way to achieve this, thanks in advance!

トピック

分析サーバーレス

タグ

AWS Data Pipeline Amazon OpenSearch Service Amazon EMR Serverless

言語

English

Osain Abitia

質問済み 2年前141ビュー

回答なし

新しい順
投票が多い順
コメントが多い順

関連するコンテンツ

how to create an instance VM server and how to make a plan for 3 years?
kou
質問済み 2ヶ月前
ECRにあるコンテナレジストリをApp Runnerで動かしたいが、アプリケーションログに記載のある「11-07-2023 03:09:09 AM make: *** No rule to make target 'sleep-and-migrate-up'. Stop.」のエラーを解消できない。
Yuuki
質問済み 6ヶ月前
_cluster/settingでbodyの内容をPUTすることができないのはなぜですか。
承認された回答
yuuka_u
質問済み 5ヶ月前
How to add a policy to the permission policy of IAM roles that allows only users verified through the Cognito user pool to download files during S3 download.如何追加一个策略，使得该策略可以在S3下载时，仅允许通过Cognito用户池验证的用户
承認された回答
Dgk
質問済み 3ヶ月前
EMR クラスターが終了したのはなぜですか?
AWS公式更新しました 1年前
Amazon EMR の Yarn アプリケーションが引き続き [Accepted] (承認済み) 状態になっているのはなぜですか?
AWS公式更新しました 1年前
AWS Data Pipeline でデータベース認証情報を非表示または暗号化する方法を教えてください。
AWS公式更新しました 3年前
Amazon EMR クラスターを作成するときに「EMR_DefaultRole is invalid」(EMR_DefaultRole が無効です) エラーと「EMR_EC2_DefaultRole is invalid」(EMR_EC2_DefaultRole が無効です) というエラーが表示されるのはなぜですか?
AWS公式更新しました 2年前
アベイラビリティーゾーン (AZ) の移行&インスタンスのアップグレードガイド
エキスパート
Sumikawa_M
公開済み 2ヶ月前