​Pourquoi mon cluster EMR s’est terminé ?

Comment résoudre l'erreur « javasqlSQLException: No more data to read from socket » lorsque j'essaie de me connecter à mon instance Amazon RDS for Oracle ?

Comment copier un gros volume de données à partir d'Amazon S3 vers HDFS dans mon cluster Amazon EMR ?

Comment résoudre l'erreur d'exécution du pipeline de création d'image « Unable to bootstrap TOE » dans Image Builder ?

## Main problem
I understand that is no need to add Auto Scaling to an EMR cluster launched by Data Pipeline. Instead, we can specify the **capacity up-front** and it will be used for the duration of the job. But what if I am doing transformation on some data weekly basis and the instance type size needed do this every week, and I can't be sure about how many nodes are required in the cluster for better performance?

### Possible solution?
At the moment, I can predict the amount of data that EMR could process thanks to the number of events tracked in a period of time on **OpenSearch** (where EMR will extract data), E.g. *If 1 EMR node can handle 1,000 events and the actual number of events was 10,000 then create 10 nodes*.

I though on create an EventBridge cron job to execute a Lambda function ~10 minutes before the **Data pipeline** process and calculate the number of nodes, then store the value in a service like SSM parameter store. So when the Data Pipeline starts, I can be able to retrieve the value and pass it as a parameter for the task. 

This may sound a little complicated so I would like to know if there's maybe an easier way to achieve this, thanks in advance!

Is it possible to dynamically change the capacity-upfront for EMR cluster using Data Pipeline?

Analytique

Sans serveur

Data labeling sagemaker ground truth

account is currently blocked and not recognized as a valid account

[ACTION REQUIRED] Update your TLS connections to 1.2 : quelles applications sont concernées ?

error 403 / cloud front

Is it possible to dynamically change the capacity-upfront for EMR cluster using Data Pipeline?

Main problem

Possible solution?

Contenus pertinents