I want to troubleshoot an Amazon SageMaker AI notebook instance that can't open Jupyter.
Resolution
To troubleshoot a SageMaker AI notebook instance that can't open Jupyter, take the following actions:
- On the SageMaker AI console, confirm that the notebook instance status is InService. If the status is Pending, then the notebook instance isn't ready yet.
- Clear your browser cache. Or, use a different browser to access the Jupyter notebook.
- Access the Jupyter notebook without browser extensions. Proxy configurations might cause your Jupyter notebook not to open.
- Switch to a different network environment. If you tried to open Jupyter from your organization's network, then try your home network.
- Check if there's a firewall that blocks access. Proxy or antivirus software on your local machine might block the WebSocket connection.
- Check the browser's network logs for any WebSocket connectivity errors. To view the logs, use the browser developer mode.
- Check the Jupyter logs for errors.
If you still can't open the Jupyter notebook, then restart the notebook instance. This action causes SageMaker AI notebook instance to replace the underlying Amazon Elastic Compute Cloud (Amazon EC2) instance. It's a best practice to regularly restart your notebook instances to keep the software updated. Also, the notebook instance transitions to a new host that might help resolve HTTP 503 and 504 browser errors.
Note: The /home/ec2-user/SageMaker file system is the only persistent storage on the notebook instance. When you restart the instance, you lose all other data.
To restart a SageMaker AI notebook instance, complete the following steps:
- Open the SageMaker AI console.
- In the navigation pane, choose Notebook instances.
- Choose the notebook instance that you want to restart.
- On Actions dropdown list, choose Stop.
- After the notebook instance reaches the Stopped status, choose Start on the Actions dropdown list.
- Open the notebook instance URL.
Troubleshoot your overloaded notebook instance
Take the following actions to resolve an overloaded notebook instance.
Too many open sessions
If you have too many active sessions and notebooks, then notebooks take longer to load and might time out in the browser. To view your open sessions, check the Running tab on the Jupyter dashboard. Then, close unnecessary notebooks or open terminal sessions.
High CPU or memory utilization
To check your CPU or memory utilization, complete the following steps:
-
Open the Jupyter dashboard, and then choose the Files tab.
-
Choose New, and then choose Terminal.
-
Check your memory utilization:
$ free -h
-
Check your CPU utilization:
$ top
If your CPU or memory utilization is high and you can't free up more resources, then complete the following steps to switch to a larger notebook instance:
- Stop the notebook instance.
- Choose the Actions dropdown list, and then choose Update settings.
- Choose a new notebook instance type, and then choose Save.
Note: For a list of available instance types in each AWS Region, see Amazon SageMaker AI pricing.
- On the Actions dropdown list, choose Start.
- Open the notebook instance URL.
High disk utilization
To check your disk utilization, complete the following steps:
-
Open the Jupyter dashboard, and then choose the Files tab.
-
Choose New, and then choose Terminal.
-
Start an SSH session, and then check your disk utilization:
$ df -h
-
Check the disk utilization for filesystem /home/ec2-user/SageMaker.
If the disk utilization is high, then remove temporary files from the /home/ec2-user/SageMaker directory.
Or, complete the following steps to increase the Amazon Elastic Block Store (Amazon EBS) volume size:
- Stop the notebook instance.
- In the Actions dropdown list, choose Update settings.
- Enter a new volume size, and then choose Save.
Note: The default Amazon EBS volume size is 5 GB. You can increase the volume size up to 16 TB.
- On the Actions dropdown list, choose Start.
- Open the notebook instance URL.
You receive an error after you attach the lifecycle script
The following scenarios can cause lifecycle script issues.
Lifecycle script takes longer than 5 minutes to run
Before the lifecycle configuration script runs, the notebook waits up to 5 minutes. If you install packages when you start the notebook, then the installation process might take longer than 5 minutes. To resolve this issue, run the nohup command to allow the lifecycle script to run in the background.
Example command:
===
#!/bin/bash
set -e
nohup pip install xgboost &
=====
Note: When you install the libraries, the script stops running. To check whether the script is still running, run the ps command.
LifeCycle script fails with a 127 error code
This error occurs when you edit the script in Windows, and you edit extra Windows line endings (CRLF). This error also occurs when you create scripts in Windows, and then you copy the scripts to a Unix environment. Windows and Unix systems use different symbols to represent the line feed characters. To resolve this issue, set the text editor to create files in the Unix format.
For example, in Notepad++, you can find the document format in the bottom right of the screen. By default, the format is set to Dos\Windows. You can change the format to Unix in the Notepad++ settings. Or, convert EOL characters to Unix format from the Edit menu.