EMR Notebook Box API Integration Issue

0

Hi Experts, I am trying to use box.api in the EMR Notebook (SparkR Kernel) and using http proxy on the EMR host to route traffic to internet. The connection to box.api.com is established on the EMR server by setting up the http_proxy environment variable, but I am unable to connect to internet from the EMR Notebook as the http_proxy environment variable doesn't seem to be set for the Notebook users. Any hints how to set the global environment variable (http proxy) on the EMR server for all EMR Notebook users?

The EMR notebooks seem to be running in a docker image. I also tried setting up global environment variable in /etc/profile.d, but no luck.

asked a year ago250 views
1 Answer
0

Hi, Thanks for writing to re:Post.

I Understand that you want help in connecting Notebook to Internet.

  • To enable access to internet via the notebook, you will need to check if there is a security group named 'ElasticMapReduceEditors-Editor' or else create new security group for 'ElasticMapReduceEditors-Editor' with the following ingress and egress : [1] =========================== Inbound: None

Outbound: Allow the notebook to route traffic to the internet via the cluster, as the following example demonstrates:

Type Protocol PortRange Destination Custom TCP rule TCP 18888 SG- HTTPS TCP 443 0.0.0.0/0

This should help in connection to the Internet through your notebook.

I hope you find this information helpful.

Thank you and have a good rest of your day!

[1] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-security-groups.html

AWS
SUPPORT ENGINEER
answered a year ago
  • Hi Mudasser, Thanks for the tips. I did follow the steps to setup security groups to allow inbound and outbound traffic, but it didn't work. In our environment the Internet Router is not enabled and internet traffic is routed through http proxy server. If I set the http_proxy environment variable on the EMR host then I can connect to internet via proxy without any issues, however the EMR Notebook is not able to connect to internet through http_proxy. There might be some configuration for notebook to use http_proxy environment variable that I am unable to figure it out.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions