By using AWS re:Post, you agree to the AWS re:Post Terms of Use

SageMaker SKLearn Image with up to date Py and Scikit-learn

0

Has anyone created a Docker training image with newer versions of Python, scikit-learn, pandas, or XGBoost?

I’m looking for a training image that includes up-to-date versions of these tools, as the existing ones use older versions and Id just like to not have to work with 4 year old versions.

I’ve considered:

  • Extending existing containers: But I’m not sure if I can simply update Python and the libraries without compatibility issues.
  • Forking and editing the repository: However, I don’t feel confident enough to do this without potentially breaking something.

Does anyone have a pre-built image for this, or can guide me on how to easily create one myself?

SKLearn Container: https://github.com/aws/sagemaker-scikit-learn-container Extending an Image: https://docs.aws.amazon.com/sagemaker/latest/dg/prebuilt-containers-extend.html

asked 19 days ago35 views
1 Answer
0

Amazon SageMaker provides pre-built Docker images for various machine learning frameworks, including scikit-learn. However, these images may not always include the most up-to-date versions of Python, scikit-learn, pandas, or XGBoost.

You have a few options to work with more recent versions:

  1. Customizing pre-built containers: You can extend the existing SageMaker scikit-learn containers to include newer versions of the libraries you need. This approach allows you to add necessary libraries and functionalities while maintaining compatibility with SageMaker. However, as you mentioned, you need to be cautious about potential compatibility issues when updating Python and other libraries.

  2. Creating a custom container: You can create your own Docker container with the specific versions of Python, scikit-learn, pandas, and XGBoost that you need. This gives you full control over the environment but requires more effort to ensure compatibility with SageMaker.

  3. Using SageMaker Processing: For data preprocessing tasks, you can use SageMaker Processing with a custom Docker image that includes the latest versions of the libraries you need.

If you decide to extend the existing containers, you can start by modifying the Dockerfile of the SageMaker scikit-learn container. You would update the base image to a newer Python version and then install the desired versions of scikit-learn, pandas, and XGBoost using pip or conda.

Here's a general approach to extend the container:

  1. Start with the existing Dockerfile from the SageMaker scikit-learn container repository.
  2. Update the base image to a newer Python version.
  3. Install the desired versions of scikit-learn, pandas, and XGBoost.
  4. Add any other necessary dependencies.
  5. Build and test the container locally.
  6. Push the container to Amazon ECR for use with SageMaker.

Keep in mind that while this approach can work, it requires careful testing to ensure all components are compatible and function correctly within the SageMaker environment. If you're not confident in making these changes, you might want to consult with someone who has experience in Docker and SageMaker container customization.

Unfortunately, there isn't a widely available pre-built image that includes the latest versions of all these tools specifically for SageMaker. The best approach would be to create a custom container tailored to your specific needs while ensuring compatibility with SageMaker's requirements.
Sources
Pre-built SageMaker Docker images - Amazon SageMaker
Resources for using Scikit-learn with Amazon SageMaker - Amazon SageMaker
Prebuilt SageMaker Docker images for deep learning - Amazon SageMaker
Custom Docker containers with SageMaker - Amazon SageMaker

profile picture
answered 19 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions