SPARK Upgrade on EMR best practices

0

Hello Team - Good Morning. My customer is using SPARK 2.4 on EMR for their batch workloads. They are planning for migration to SPARK 3.3 and looking for some guidance/best practices for this migration. Can you please share if any reference docs/blog posts etc.. related to this topic. Thanks in advance

posta un anno fa327 visualizzazioni
1 Risposta
0

You can use Spark section from this EMR best practices guide. Feel free to share here or create a specReq if customer has any specific question. Here are few basic things to keep in mind.

  • Handle data skew
  • Make sure there is no disk spill happening
  • Optimal partition size to make sure not too many tasks are created
  • Use the right data format for source and target (preferably parquet)
  • Watch for excessive shuffle. Can be confirmed from Spark UI.
  • Tune driver/executor size (memory, core) based on workload.
AWS
Kashif
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande