make julia notebooks work in SageMaker

1

I want to run a Jupyter notebook in SageMaker with a Julia kernel. There is very little documentation about this. There is this:

https://d1.awsstatic.com/whitepapers/julia-on-sagemaker.pdf?did=wp_card&trk=wp_card

I followed all the instructions, and Julia shows up in the JupyterLab launcher; but when I run it, Julia 1.17.1 shows up as the kernel and then dies. It appears to be trying, but then gives up and says "No Kernel" instead of "Julia 1.17.1" in the status line.

If I run the R kernel, all goes well. If I run the Julia kernel (which shows up in the list of available kernels!), I get the following error message:

Connection failed

A connection to the notebook server could not be established.

The notebook will continue trying to reconnect.

Check your network connection or notebook server configuration.

asked 2 years ago546 views
2 Answers
1

Sorry this is not a full working solution, but too much for a comment & hopefully will still be useful to you:

(Assuming you're talking here about SageMaker Notebook Instances rather than SageMaker Studio, same as the whitepaper; that you're trying to install the latest version of Julia from conda rather than v1.0.3 explicitly specified in the whitepaper, currently 1.7.1)

The first thing to point out is that you should be able to debug this via the notebook's logs: Either click the "View logs" link on the notebook's detail page in Amazon SageMaker console, or look in the Amazon CloudWatch console for log group /aws/sagemaker/NotebookInstances, stream {YOUR-NBI-NAME}/jupyter.log.

Following through the whitepaper instructions myself (on an notebook-al2-v1 notebook instance), the errors I saw preventing the kernel from loading were like:

ArgumentError: Package IJulia not found in current path:
- Run `import Pkg; Pkg.add("IJulia")` to install the IJulia package.

Looking at ~/.local/share/jupyter/kernels/julia-1.7/kernel.json, I saw the created kernel was defined as follows:

{
  "display_name": "Julia 1.7.1",
  "argv": [
    "/home/ec2-user/anaconda3/envs/julia/bin/julia",
    "-i",
    "--color=yes",
    "--project=@.",
    "/home/ec2-user/anaconda3/envs/julia/share/julia/packages/IJulia/e8kqU/src/kernel.jl",
    "{connection_file}"
  ],
  "language": "julia",
  "env": {},
  "interrupt_mode": "signal"
}

If we run julia from within the julia conda environment, using IJulia works no problem... However, if you source activate JupyterSystemEnv from the terminal - you can still run /home/ec2-user/anaconda3/envs/julia/bin/julia from the terminal but the interpreter will not think the IJulia package is installed... I think this is closer to what the above kernel definition is doing (NBI JupyterServer itself runs in this system conda env).

I tried a couple of hacky solutions:

  1. Simply start Julia in JupyterSystemEnv as above and Pkg.add("IJulia") to install it there
  2. Copy the setup you'll see in the R kernel, /home/ec2-user/.local/share/jupyter/kernels/ir - where kernel.json points to a run.sh script which first activates the target conda environment and then runs the interpreter

kernel.json (with a different display name & file path to visually confirm JupyterLab has picked the new one up):

{
  "display_name": "Julia Fix",
  "argv": [
    "/home/ec2-user/.local/share/jupyter/kernels/julia-fix/run.sh",
    "{connection_file}"
  ],
  "language": "julia",
  "env": {},
  "interrupt_mode": "signal"
}

run.sh (remember to chmod +x this file to avoid 500 server errors as Jupyter can't execute it)

#!/bin/bash

source activate julia
/home/ec2-user/anaconda3/envs/julia/bin/julia -i --project=@. /home/ec2-user/anaconda3/envs/julia/share/julia/packages/IJulia/e8kqU/src/kernel.jl $1

HOWEVER, unfortunately both approaches yield a ZMQStream Invalid Signature error, which looks to me like this open IJulia issue. I see speculation there that Julia isn't playing nice with conda, but am not deep enough with Julia to know if that's really the root cause or how best to mitigate it if so.

Maybe you could install Julia itself outside of conda (conda deactivate in terminal to check you're in base environment) and, if still getting the signature error, use a run.sh kernel script to ensure Jupyter also runs the command outside of conda rather than in the JupyterSystemEnv? Or perhaps there's some other cause for the signature issue that can be resolved while keeping conda environments set up...

You could also look into the SageMaker Studio Custom Image Samples, instead of Notebook Instances, since Studio kernels are isolated by full container images rather than conda? There is an example image there for Julia (v1.5), although it's not been updated in some time so of course could have some issues of its own.

AWS
EXPERT
Alex_T
answered 2 years ago
  • This is super helpful, Alex T. I spent all morning yesterday on the question we are discussing here, and it appears you also ran into trouble, although you got a lot further than I did. Thank you. I will try some of this again with your instructions.

  • Then I spent the afternoon trying to install this custom SageMaker docker image:

    https://github.com/aws-samples/sagemaker-studio-custom-image-samples/tree/main/examples/jupyter-docker-stacks-julia-image

    Even though I followed all the instructions, I was not able to upload the image into SageMaker

    docker push XXXXX.dkr.ecr.us-west-2.amazonaws.com/smstudio-custom:julia-datascience

    just kept "Retrying ..." -- so, all to say, this has been very frustrating.

  • I currently run Julia notebooks in CoCalc. They have great customer service at CoCalc, but I was hoping AWS could speed up processing times. I will keep trying and post a solution here if I find one. In the meantime, if anybody has gotten Julia/Jupyter notebooks to work on AWS SageMaker, please post instructions.

  • Sorry only just saw your comments! That "Retrying..." error is odd - I've not seen it before, and seems like you were just not able to push container image to ECR - so would be local docker + Amazon ECR permissions to investigate, rather than anything SageMaker-specific. May be worth a separate question with any other debug information you have?

    If there's a chance it might be related to your local internet connectivity (pulling all the source layers and then pushing the image up to us-west-2 cloud), you could also try doing those docker steps from your SageMaker Notebook Instance terminal?

0

Note that Alex_T found a (hacky) solution to the ZMQStream problem here:

https://www.repost.aws/questions/QUM2y8-rjDS1Wp5LUkdLMhyA/uncaught-exception-in-zmq-stream-callback-trying-to-run-jupyter-notebook-with-julia-kernel-in-sage-maker

and got the Jupyter notebook with Julia kernel to work in SageMaker.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions