How do I link an Amazon EMR notebook to a Git repository?

2 minuti di lettura

I want to link my Amazon EMR notebook with a Git repository.


Associating Git repositories with Amazon EMR notebooks allows you to save your notebooks in a version-controlled environment. You can associate up to three repositories with a notebook.

To create a new EMR notebook and then associate it with an existing Git repository, do the following:

1.    Create a private subnet in a virtual private cloud (VPC).

2.    Create a NAT gateway.

3.    Update the route table to point to the NAT gateway.

4.    Launch an Amazon EMR cluster in the private subnet. In the Software configuration section, be sure that you select a configuration that includes Apache Spark, Apache Hadoop, and Apache Livy.

5.    When you're waiting for the EMR cluster to reach the WAITING state, add the Git repository. For Git credentials, choose Create a new secret. Be sure that the Username is the alias of the Git account and not the email address. For more information, see Working with aliases.

6.    Create a security group with the following outbound rules:
Rule 1
Type: Custom TCP rule
Protocol: TCP
Port Range: 18888
Destination: ElasticMapReduceEditors-Livy

Rule 2
Protocol: TCP
Port Range: 443

This allows the notebook to reach the internet using the cluster. For more information, see Custom EC2 security group for EMR notebooks when associating notebooks with Git repositories.

7.    Add an inbound rule to the ElasticMapReduceEditors-Livy security group:
Type: Custom TCP rule
Protocol: TCP
Port Range: 18888
Destination: Enter the name of the security group that you created in the previous step.

8.    Modify the service role for EMR notebooks (EMR_Notebooks_DefaultRole) to allow the secretsmanager:GetSecretValue action.

9.    Create an EMR notebook with the following security group settings:
In the Security groups section, select Choose security groups.
For Security groups for master instance, choose ElasticMapReduceEditors-Livy.
For Security groups for notebook instance, choose the security group that you created in step 6.

The Git repository status changes to Linked. You can now use the Git repository in the notebook.

Related information

Associating Git-based repositories with EMR notebooks

EMR notebooks

AWS UFFICIALEAggiornata 2 anni fa