How do I back up my SageMaker AI notebook instance data?

3 minute read
0

I want to back up my Amazon SageMaker AI notebook instance data.

Short description

SageMaker AI notebook instances use Amazon Elastic Block Store (Amazon EBS) volumes to store data. You mount the EBS volume to the /home/ec2-user/SageMaker directory. When a notebook instance is in service or available to start, you can create a backup and migrate the backup to a new notebook.

Important: If you delete the notebook instance before you create a backup, then the EBS volume that's attached to the notebook is also deleted.

Resolution

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

To create a backup of your notebook instance, use the notebook terminal or a lifecycle configuration script.

Use notebook terminal to create a backup

Complete the following steps:

  1. Create an Amazon Simple Storage Service (Amazon S3) bucket.
  2. Create a folder in the S3 bucket for your backup.
  3. Open the SageMaker AI console.
  4. In the navigation pane, choose Notebook instances, and then select your notebook instance.
  5. Under Actions, choose Open Jupyter.
  6. To open your notebook instance terminal, choose New, and then choose Terminal.
  7. To copy the files from your notebook instance to your S3 bucket folder, run the cp AWS CLI command:
    aws s3 cp --recursive /home/ec2-user/SageMaker/ s3://aws-s3-bucket/folder-name/
    Note: Replace s3://aws-s3-bucket and folder-name with your values.
  8. (Optional) To copy files from your S3 bucket to a new notebook instance, run the cp AWS CLI command:
    aws s3 cp --recursive s3://aws-s3-bucket/folder-name/ /home/ec2-user/SageMaker/ 
    Note: Replace s3://aws-s3-bucket and folder-name with your values.

Use a lifecycle configuration script to create a backup

Complete the following steps:

  1. Open the SageMaker AI console.
  2. In the navigation pane, choose Lifecycle configurations.
  3. Choose Create configuration. For Name, enter a name for the backup, such as ebs-backup.
  4. Under Scripts, choose the Start notebook tab, and then enter your script. For an example script, see amazon-sagemaker-notebook-instance-lifecycle-config-samples on the GitHub website.
    Note: Keep the Create notebook tab empty.
  5. Choose Create configuration.
  6. Navigate to your notebook instance.
    Note: You can attach a lifecycle configuration to an existing notebook instance only when the instance is in the Stopped state.
  7. Choose Edit.
  8. Choose Additional configuration.
  9. For Lifecycle configuration, select your configuration.
  10. Choose Update notebook instance.
  11. Under Tags, choose Edit.
  12. Add a tag with a key value for your bucket, for example sagemaker-ebs-backup-region-account_id.
    Note: The attached execution role must allow permissions to perform an Amazon S3 sync.
  13. Choose Save.
  14. To create a backup, start the notebook instance.

The snapshot looks similar to /home/ec2-user/SageMaker/, and you can find it in s3://ebs-backup-bucket/source-instance-name_snapshot-timestamp/. The backup is complete when you see the file /home/ec2-user/SageMaker/BACKUP_COMPLETE.

Note: The backup process time depends on the total size of the data in the volume.

AWS OFFICIAL
AWS OFFICIALUpdated 17 days ago