Setting Up a Local Development Environment for SageMaker

0

Hello everyone,

I'm currently working on a project where I have a set of Python scripts that train a variety of models (including sklearn, xgboost, and catboost) and save the most accurate model. I also have inference scripts that use this model for batch transformations.

I'm not interested in using the full suite of SageMaker Studio features, as I want to set up the development environment locally. However, I do want to leverage SageMaker when it comes to running the code on AWS resources (for model training and inference).

I'm also planning to use GitHub Actions to semi-automate this process. My current plan is to build my own environment using a Docker container. The image built can then be deployed to SageMaker via ECR.

I'm wondering if anyone has come across any resources that could help me achieve this? I'm particularly interested in best practices for setting up a local development environment that can easily transition to SageMaker for training and inference.

Any advice or pointers would be greatly appreciated! Thanks in advance!

2 Answers
1

There's quite a few different ways to go about this. So I'll try to steer you in the right direction.

For training, taking a look at the Sagemaker SDK would be a good start. It allows you to write code locally, but say train a model using remotely using Sagemaker. Note, that this will create Model in Sagemaker's model registry. If you don't want that, using a bare VM might be the best choice.

If you were set on using your own custom Docker container, you can still use Sagemaker for deployment (or something like ECS), this page here would be a helpful start, particularly the Steps for model deployment and Bring your own model section would be helpful.

JamesM
answered a month ago
0

Hi,

Did you try the (very) new local mode of Sagemaker Studio announced in December: https://aws.amazon.com/about-aws/whats-new/2023/12/sagemaker-studio-local-mode-docker/

Studio users can now run SageMaker processing, training, inference and batch 
transform jobs locally on their Studio IDE instance. Users can also build and test 
SageMaker compatible Docker images locally in Studio IDEs. 

Data scientists can iteratively develop ML models and debug code changes quickly 
without leaving their IDE or waiting for remote compute resources. Users can run 
small-scale jobs locally to test implementations and inspect outputs before running 
full jobs in the cloud. This optimizes workflows by providing instant feedback on code changes 
and catching issues early without waiting for cloud resources. 

It seems to match very well want you want to achieve.

Reference documentation is here: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-local-mode.html

Best,

Didier

profile pictureAWS
EXPERT
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions