How do I create an Amazon EMR Serverless application to run a Spark job?

2 minute read
0

I want to create an Amazon EMR Serverless application to run a Spark job.

Resolution

To create an EMR Serverless application to run a Spark job, complete the following steps:

  1. Open the Amazon EMR console.
  2. In the navigation pane, choose EMR Serverless.
  3. Create a new EMR Studio, or select an existing Studio:
    If you don't have a Studio, then choose Get started, and then choose Create and launch EMR Studio.
    If you have a Studio, then select the Studio, and then choose Manage applications.
  4. On the application page, choose Create application.
  5. Enter the name of your application, and then choose Create and start application.
    Note: When you set up your application, choose Spark as the Type and choose the Amazon EMR version that you want to use as the Release version.
  6. After the application Status changes to Started, choose the name of the application.
  7. Choose Submit batch job run.
  8. In the job settings, enter the name of your job and your Amazon Simple Storage Service (Amazon S3) bucket script location. Then, select the runtime role.
  9. (Optional) To run a Spark word count job as a sample job, set s3://example-region.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py in the script location and s3://example-bucket/example-output in the script arguments.
    Note: If you don't have a runtime role, then choose Create a new role, and then choose Create role. For more information, see Job runtime roles for Amazon EMR Serverless.
  10. Choose Submit job run.
  11. On the Batch job runs tab, confirm that your Spark job runs.
  12. After the Run status changes to Success, you can check your job results. If you ran a Spark word count job, then check your Amazon S3 path for your job results.
  13. To view the Spark UI, select the job run name. Then, take the following actions:
    Choose View application UIs.
    Choose Spark UI (running jobs) or Spark History Server (Completed jobs).
    Note: In the Spark UI, you can retrieve corresponding driver and runtime logs in the Executors tab. When you submit a job run, you can choose how EMR Serverless stores and serves application logs.

Related information

How do I use alternative storage options for EMR Serverless?

AWS OFFICIAL
AWS OFFICIALUpdated a month ago