How do I link my Amazon EMR notebook to a Git-based repository?

2 minute read
0

I want to link my Amazon EMR notebook to a Git-based repository.

Resolution

Note: Amazon EMR notebooks are available as Amazon EMR Studio Workspaces in the new Amazon EMR console.

To create a new Amazon EMR notebook in the old console and associate the notebook with a Git-based repository, complete the following steps:

  1. Create a private subnet in an Amazon Virtual Private Cloud (Amazon VPC).
  2. Create a NAT gateway, and then update your route table to point to the NAT gateway.
  3. Launch an Amazon EMR cluster in the private subnet. In the Software configuration section, make sure that you select a configuration that includes Apache Spark, Apache Hadoop, and Apache Livy.
  4. When you're waiting for the cluster to reach the WAITING state, add the Git-based repository.
  5. For Git credentials, choose Create a new secret. Make sure that the username is the alias of the Git account.
  6. Create a custom security group that's named ElasticMapReduceEditors-Editor with the following outbound rules:
    For rule 1, set Type to Custom TCP rule, Protocol to TCP, Port Range to 18888, and Destination to ElasticMapReduceEditors-Livy.
    For rule 2, set Type to HTTPS, Protocol to TCP, Port Range to 443, and Destination to 0.0.0.0/0.
  7. Add an inbound rule to the ElasticMapReduceEditors-Livy security group with the following settings:
    Type: Custom TCP rule
    Protocol: TCP
    Port Range: 18888
    Destination: Enter the name of your custom security group.
  8. Modify the EMR_Notebooks_DefaultRole Amazon EMR notebooks service role to allow the secretsmanager:GetSecretValue action.
  9. Create an Amazon EMR notebook with the following security group settings:
    In the Security groups section, select Choose security groups.
    For Security groups for master instance, choose ElasticMapReduceEditors-Livy.
    For Security groups for notebook instance, choose your custom security group.
  10. Confirm that the Git-based repository status changes to Linked. When the status changes to Linked, you can use Git repositories in your notebook.
AWS OFFICIAL
AWS OFFICIALUpdated 3 months ago
2 Comments

It is NOT necessary to use an EMR Cluster in a private subnet in order to use git from EMR Studio. It may however be (probably is?) necessary for the EMR Studio itself to be created in a private subnet (and those two subnets would need to be in the same VPC and be able to talk to each other).

replied 10 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
EXPERT
replied 10 months ago