SPARK Upgrade on EMR best practices

0

Hello Team - Good Morning. My customer is using SPARK 2.4 on EMR for their batch workloads. They are planning for migration to SPARK 3.3 and looking for some guidance/best practices for this migration. Can you please share if any reference docs/blog posts etc.. related to this topic. Thanks in advance

preguntada hace un año327 visualizaciones
1 Respuesta
0

You can use Spark section from this EMR best practices guide. Feel free to share here or create a specReq if customer has any specific question. Here are few basic things to keep in mind.

  • Handle data skew
  • Make sure there is no disk spill happening
  • Optimal partition size to make sure not too many tasks are created
  • Use the right data format for source and target (preferably parquet)
  • Watch for excessive shuffle. Can be confirmed from Spark UI.
  • Tune driver/executor size (memory, core) based on workload.
AWS
Kashif
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas