I want to link my Amazon EMR notebook to a Git-based repository.
Resolution
Note: Amazon EMR notebooks are available as Amazon EMR Studio Workspaces in the new Amazon EMR console.
To create a new Amazon EMR notebook in the old console and associate the notebook with a Git-based repository, complete the following steps:
- Create a private subnet in an Amazon Virtual Private Cloud (Amazon VPC).
- Create a NAT gateway, and then update your route table to point to the NAT gateway.
- Launch an Amazon EMR cluster in the private subnet. In the Software configuration section, make sure that you select a configuration that includes Apache Spark, Apache Hadoop, and Apache Livy.
- When you're waiting for the cluster to reach the WAITING state, add the Git-based repository.
- For Git credentials, choose Create a new secret. Make sure that the username is the alias of the Git account.
- Create a custom security group that's named ElasticMapReduceEditors-Editor with the following outbound rules:
For rule 1, set Type to Custom TCP rule, Protocol to TCP, Port Range to 18888, and Destination to ElasticMapReduceEditors-Livy.
For rule 2, set Type to HTTPS, Protocol to TCP, Port Range to 443, and Destination to 0.0.0.0/0.
- Add an inbound rule to the ElasticMapReduceEditors-Livy security group with the following settings:
Type: Custom TCP rule
Protocol: TCP
Port Range: 18888
Destination: Enter the name of your custom security group.
- Modify the EMR_Notebooks_DefaultRole Amazon EMR notebooks service role to allow the secretsmanager:GetSecretValue action.
- Create an Amazon EMR notebook with the following security group settings:
In the Security groups section, select Choose security groups.
For Security groups for master instance, choose ElasticMapReduceEditors-Livy.
For Security groups for notebook instance, choose your custom security group.
- Confirm that the Git-based repository status changes to Linked. When the status changes to Linked, you can use Git repositories in your notebook.