adjusting sagemaker xgboost project to tensorflow (or even just different folder name)

0

I have sagemaker xgboost project template "build, train, deploy" working, but I'd like to modify if to use tensorflow instead of xgboost. First up I was just trying to change the abalone folder to topic to reflect the data we are working with.

I was experimenting with trying to change the topic/pipeline.py file like so

    image_uri = sagemaker.image_uris.retrieve(
        framework="tensorflow",
        region=region,
        version="1.0-1",
        py_version="py3",
        instance_type=training_instance_type,
    )

i.e. just changing the framework name from "xgboost" to "tensorflow", but then when I run the following from a notebook:

from pipelines.topic.pipeline import get_pipeline


pipeline = get_pipeline(
    region=region,
    role=role,
    default_bucket=default_bucket,
    model_package_group_name=model_package_group_name,
    pipeline_name=pipeline_name,
)

I get the following error

ValueError                                Traceback (most recent call last)
<ipython-input-5-6343f00c3471> in <module>
      7     default_bucket=default_bucket,
      8     model_package_group_name=model_package_group_name,
----> 9     pipeline_name=pipeline_name,
     10 )

~/topic-models-no-monitoring-p-rboparx6tdeg/sagemaker-topic-models-no-monitoring-p-rboparx6tdeg-modelbuild/pipelines/topic/pipeline.py in get_pipeline(region, sagemaker_project_arn, role, default_bucket, model_package_group_name, pipeline_name, base_job_prefix, processing_instance_type, training_instance_type)
    188         version="1.0-1",
    189         py_version="py3",
--> 190         instance_type=training_instance_type,
    191     )
    192     tf_train = Estimator(

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/utilities.py in wrapper(*args, **kwargs)
    197                 logger.warning(warning_msg_template, arg_name, func_name, type(value))
    198                 kwargs[arg_name] = value.default_value
--> 199         return func(*args, **kwargs)
    200 
    201     return wrapper

/opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in retrieve(framework, region, version, py_version, instance_type, accelerator_type, image_scope, container_version, distribution, base_framework_version, training_compiler_config, model_id, model_version, tolerate_vulnerable_model, tolerate_deprecated_model, sdk_version, inference_tool, serverless_inference_config)
    152             if inference_tool == "neuron":
    153                 _framework = f"{framework}-{inference_tool}"
--> 154         config = _config_for_framework_and_scope(_framework, image_scope, accelerator_type)
    155 
    156     original_version = version

/opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in _config_for_framework_and_scope(framework, image_scope, accelerator_type)
    277         image_scope = available_scopes[0]
    278 
--> 279     _validate_arg(image_scope, available_scopes, "image scope")
    280     return config if "scope" in config else config[image_scope]
    281 

/opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in _validate_arg(arg, available_options, arg_name)
    443             "Unsupported {arg_name}: {arg}. You may need to upgrade your SDK version "
    444             "(pip install -U sagemaker) for newer {arg_name}s. Supported {arg_name}(s): "
--> 445             "{options}.".format(arg_name=arg_name, arg=arg, options=", ".join(available_options))
    446         )
    447 

ValueError: Unsupported image scope: None. You may need to upgrade your SDK version (pip install -U sagemaker) for newer image scopes. Supported image scope(s): eia, inference, training.

I was skeptical that the upgrade suggested by the error message would fix this, but gave it a try:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pipelines 0.0.1 requires sagemaker==2.93.0, but you have sagemaker 2.110.0 which is incompatible.

So that seems like I can't upgrade sagemaker without changing pipelines, and it's not clear that's the right thing to do - like this project template may be all designed around those particular ealier libraries.

But so is it that the "framework" name should be different, e.g. "tf"? Or is there some other setting that needs changing in order to allow me to get a tensorflow pipeline ...?

However I find that if I use the existing abalone/pipeline.py file I can change the framework to "tensorflow" and there's no problem running that particular step in the notebook.

I've searched all the files in the project to try and find any dependency on the abalone folder name, and the closest I came was in codebuild-buildspec.yml but that hasn't helped.

Has anyone else successfully changed the folder name from abalone to something else, or am I stuck with abalone if I want to make progress?

Many thanks in advance

p.s. is there a slack community for sagemaker studio anywhere?

p.p.s. I have tried changing all instances of the term "Abalone" to "Topic" within the topic/pipeline.py file (matching case as appropriate) to no avail

p.p.p.s. I discovered that I can get an error free run of getting the pipeline from a unit test:

import pytest

from pipelines.topic.pipeline import *

region = 'eu-west-1'
role = 'arn:aws:iam::398371982844:role/SageMakerExecutionRole'
default_bucket = 'sagemaker-eu-west-1-398371982844'
model_package_group_name = 'TopicModelPackageGroup-Example'
pipeline_name = 'TopicPipeline-Example'

def test_pipeline():
    pipeline = get_pipeline(
        region=region,
        role=role,
        default_bucket=default_bucket,
        model_package_group_name=model_package_group_name,
        pipeline_name=pipeline_name,
    )

and strangely if I go to a different copy of the notebook, everything runs fine, there ... so I have two seemingly identical ipynb notebooks, and in one of them when I switch to trying to get a topic pipeline I get the above error, and in the other, I get no error at all, very strange

p.p.p.p.s. I also notice that conda list returns very different results depending on whether I run it in the notebook or the terminal ... but the conda list results are identical for the two notebooks ...

1개 답변
1
수락된 답변

Hi! I see two parts in your question:

  1. How to use Tensorflow in a SageMaker estimator to train and deploy a model
  2. How to adapt a SageMaker MLOps template to your data and code

Tensorflow estimator is slightly different from XGBoost estimator, and the easiest way to work with it is not by using sagemaker.image_uris.retrieve(framework="tensorflow",...), but to use sagemaker.tensorflow.TensorFlow estimator instead.

These are the two examples, which will be useful for you:

As for updating the MLOps template, I recommend you to go through the comprehensive self-service lab on SageMaker Pipelines.

It shows you how to update the source directory from abalone to customer_churn. In your case it will be the topic.

P. S. As for a Slack channel, to my best knowledge, this re:Post forum now is the best place to ask any questions on Amazon SageMaker, including SageMaker Studio.

profile pictureAWS
Ivan
답변함 2년 전
  • thanks Ivan for also taking a look at this question - very helpful indeed. That "self-service lab on Sagemaker Pipelines" is exactly what I was looking for, although it doesn't mention the error message that I encountered above, and now for some reason am encountering again in the one notebook that worked yesterday - intermittent errors are of course the most frustrating :-(

    I will work through that lab from scratch after lunch, but I feel that I am missing something conceptual about the process by which a sagemaker studio project gets deployed. Like that lab mentions pushing code the repo as a way to kick off a build, while the notebook itself seems to imply that one can kick of a deploy from the notebook itself.

    Is the problem perhaps that when I'm operating from within the notebook that it's only using the code that happens to be on the latest main branch? current feature branch?

    and why doesn't python setup.py build get the requirements in the right place for the pipeline? Maybe it will be doing that if I just commit the code to the right branch?

    I'll work through the lab in a fresh project after lunch and see where I get to ...

  • Hi, @regulatansaku.

    See my comment below.

    Like that lab mentions pushing code the repo as a way to kick off a build, while the notebook itself seems to imply that one can kick of a deploy from the notebook itself.

    These are just two ways of doing the same thing. From the notebook you can try the SageMaker pipeline it in "dev" mode. When you commit the code, it will trigger a CI/CD pipeline in AWS CodePipeline, which will run the same pipeline in the automated "ops" mode, without a need to have any notebooks up and running.

    For the requirements problem, I tried to answer in your other post.

  • thanks @ivan - that's makes sense. Just for some reason I get these random intermittent errors in the notebook trying to get the pipeline, the: ValueError: Unsupported image scope: None. You may need to upgrade your SDK version (pip install -U sagemaker) for newer image scopes. Supported image scope(s): eia, inference, training.

    But so the workaround to avoid that issue appears to be to prefer to push the code to the main branch. It's great to have a way to avoid that craziness so big thanks for that

  • Hi, @regulatansaku. I'm glad that you've resolved your issue. As for the error about "Unsupported image scope" - check your code that you accidentally don't have sagemaker.image_uris.retrieve left anywhere. If you want to retrieve the TensorFlow image with this API, you need to specify either "training" or "inference" as a scope, because these two are slightly different images. But as I said before, you don't need to do that if you use the TensorFlow estimator, and not a generic Estimator.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠