SPARK Upgrade on EMR best practices

0

Hello Team - Good Morning. My customer is using SPARK 2.4 on EMR for their batch workloads. They are planning for migration to SPARK 3.3 and looking for some guidance/best practices for this migration. Can you please share if any reference docs/blog posts etc.. related to this topic. Thanks in advance

gefragt vor einem Jahr327 Aufrufe
1 Antwort
0

You can use Spark section from this EMR best practices guide. Feel free to share here or create a specReq if customer has any specific question. Here are few basic things to keep in mind.

  • Handle data skew
  • Make sure there is no disk spill happening
  • Optimal partition size to make sure not too many tasks are created
  • Use the right data format for source and target (preferably parquet)
  • Watch for excessive shuffle. Can be confirmed from Spark UI.
  • Tune driver/executor size (memory, core) based on workload.
AWS
Kashif
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen