I want to create an Amazon EMR Serverless application to run a Spark job.
Resolution
To create an EMR Serverless application to run a Spark job, complete the following steps:
- Open the Amazon EMR console.
- In the navigation pane, choose EMR Serverless.
- Create a new EMR Studio, or select an existing Studio:
If you don't have a Studio, then choose Get started, and then choose Create and launch EMR Studio.
If you have a Studio, then select the Studio, and then choose Manage applications.
- On the application page, choose Create application.
- Enter the name of your application, and then choose Create and start application.
Note: When you set up your application, choose Spark as the Type and choose the Amazon EMR version that you want to use as the Release version.
- After the application Status changes to Started, choose the name of the application.
- Choose Submit batch job run.
- In the job settings, enter the name of your job and your Amazon Simple Storage Service (Amazon S3) bucket script location. Then, select the runtime role.
- (Optional) To run a Spark word count job as a sample job, set s3://example-region.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py in the script location and s3://example-bucket/example-output in the script arguments.
Note: If you don't have a runtime role, then choose Create a new role, and then choose Create role. For more information, see Job runtime roles for Amazon EMR Serverless.
- Choose Submit job run.
- On the Batch job runs tab, confirm that your Spark job runs.
- After the Run status changes to Success, you can check your job results. If you ran a Spark word count job, then check your Amazon S3 path for your job results.
- To view the Spark UI, select the job run name. Then, take the following actions:
Choose View application UIs.
Choose Spark UI (running jobs) or Spark History Server (Completed jobs).
Note: In the Spark UI, you can retrieve corresponding driver and runtime logs in the Executors tab. When you submit a job run, you can choose how EMR Serverless stores and serves application logs.
Related information
How do I use alternative storage options for EMR Serverless?