I have sagemaker xgboost project template "build, train, deploy" working, but I'd like to modify if to use tensorflow instead of xgboost. First up I was just trying to change the abalone
folder to topic
to reflect the data we are working with.
I was experimenting with trying to change the topic/pipeline.py
file like so
image_uri = sagemaker.image_uris.retrieve(
framework="tensorflow",
region=region,
version="1.0-1",
py_version="py3",
instance_type=training_instance_type,
)
i.e. just changing the framework name from "xgboost" to "tensorflow", but then when I run the following from a notebook:
from pipelines.topic.pipeline import get_pipeline
pipeline = get_pipeline(
region=region,
role=role,
default_bucket=default_bucket,
model_package_group_name=model_package_group_name,
pipeline_name=pipeline_name,
)
I get the following error
ValueError Traceback (most recent call last)
<ipython-input-5-6343f00c3471> in <module>
7 default_bucket=default_bucket,
8 model_package_group_name=model_package_group_name,
----> 9 pipeline_name=pipeline_name,
10 )
~/topic-models-no-monitoring-p-rboparx6tdeg/sagemaker-topic-models-no-monitoring-p-rboparx6tdeg-modelbuild/pipelines/topic/pipeline.py in get_pipeline(region, sagemaker_project_arn, role, default_bucket, model_package_group_name, pipeline_name, base_job_prefix, processing_instance_type, training_instance_type)
188 version="1.0-1",
189 py_version="py3",
--> 190 instance_type=training_instance_type,
191 )
192 tf_train = Estimator(
/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/utilities.py in wrapper(*args, **kwargs)
197 logger.warning(warning_msg_template, arg_name, func_name, type(value))
198 kwargs[arg_name] = value.default_value
--> 199 return func(*args, **kwargs)
200
201 return wrapper
/opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in retrieve(framework, region, version, py_version, instance_type, accelerator_type, image_scope, container_version, distribution, base_framework_version, training_compiler_config, model_id, model_version, tolerate_vulnerable_model, tolerate_deprecated_model, sdk_version, inference_tool, serverless_inference_config)
152 if inference_tool == "neuron":
153 _framework = f"{framework}-{inference_tool}"
--> 154 config = _config_for_framework_and_scope(_framework, image_scope, accelerator_type)
155
156 original_version = version
/opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in _config_for_framework_and_scope(framework, image_scope, accelerator_type)
277 image_scope = available_scopes[0]
278
--> 279 _validate_arg(image_scope, available_scopes, "image scope")
280 return config if "scope" in config else config[image_scope]
281
/opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in _validate_arg(arg, available_options, arg_name)
443 "Unsupported {arg_name}: {arg}. You may need to upgrade your SDK version "
444 "(pip install -U sagemaker) for newer {arg_name}s. Supported {arg_name}(s): "
--> 445 "{options}.".format(arg_name=arg_name, arg=arg, options=", ".join(available_options))
446 )
447
ValueError: Unsupported image scope: None. You may need to upgrade your SDK version (pip install -U sagemaker) for newer image scopes. Supported image scope(s): eia, inference, training.
I was skeptical that the upgrade suggested by the error message would fix this, but gave it a try:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pipelines 0.0.1 requires sagemaker==2.93.0, but you have sagemaker 2.110.0 which is incompatible.
So that seems like I can't upgrade sagemaker without changing pipelines, and it's not clear that's the right thing to do - like this project template may be all designed around those particular ealier libraries.
But so is it that the "framework" name should be different, e.g. "tf"? Or is there some other setting that needs changing in order to allow me to get a tensorflow pipeline ...?
However I find that if I use the existing abalone/pipeline.py
file I can change the framework to "tensorflow" and there's no problem running that particular step in the notebook.
I've searched all the files in the project to try and find any dependency on the abalone
folder name, and the closest I came was in codebuild-buildspec.yml
but that hasn't helped.
Has anyone else successfully changed the folder name from abalone
to something else, or am I stuck with abalone
if I want to make progress?
Many thanks in advance
p.s. is there a slack community for sagemaker studio anywhere?
p.p.s. I have tried changing all instances of the term "Abalone" to "Topic" within the topic/pipeline.py
file (matching case as appropriate) to no avail
p.p.p.s. I discovered that I can get an error free run of getting the pipeline from a unit test:
import pytest
from pipelines.topic.pipeline import *
region = 'eu-west-1'
role = 'arn:aws:iam::398371982844:role/SageMakerExecutionRole'
default_bucket = 'sagemaker-eu-west-1-398371982844'
model_package_group_name = 'TopicModelPackageGroup-Example'
pipeline_name = 'TopicPipeline-Example'
def test_pipeline():
pipeline = get_pipeline(
region=region,
role=role,
default_bucket=default_bucket,
model_package_group_name=model_package_group_name,
pipeline_name=pipeline_name,
)
and strangely if I go to a different copy of the notebook, everything runs fine, there ... so I have two seemingly identical ipynb notebooks, and in one of them when I switch to trying to get a topic pipeline I get the above error, and in the other, I get no error at all, very strange
p.p.p.p.s. I also notice that conda list
returns very different results depending on whether I run it in the notebook or the terminal ... but the conda list results are identical for the two notebooks ...
thanks Ivan for also taking a look at this question - very helpful indeed. That "self-service lab on Sagemaker Pipelines" is exactly what I was looking for, although it doesn't mention the error message that I encountered above, and now for some reason am encountering again in the one notebook that worked yesterday - intermittent errors are of course the most frustrating :-(
I will work through that lab from scratch after lunch, but I feel that I am missing something conceptual about the process by which a sagemaker studio project gets deployed. Like that lab mentions pushing code the repo as a way to kick off a build, while the notebook itself seems to imply that one can kick of a deploy from the notebook itself.
Is the problem perhaps that when I'm operating from within the notebook that it's only using the code that happens to be on the latest main branch? current feature branch?
and why doesn't
python setup.py build
get the requirements in the right place for the pipeline? Maybe it will be doing that if I just commit the code to the right branch?I'll work through the lab in a fresh project after lunch and see where I get to ...
Hi, @regulatansaku.
See my comment below.
These are just two ways of doing the same thing. From the notebook you can try the SageMaker pipeline it in "dev" mode. When you commit the code, it will trigger a CI/CD pipeline in AWS CodePipeline, which will run the same pipeline in the automated "ops" mode, without a need to have any notebooks up and running.
For the requirements problem, I tried to answer in your other post.
thanks @ivan - that's makes sense. Just for some reason I get these random intermittent errors in the notebook trying to get the pipeline, the:
ValueError: Unsupported image scope: None. You may need to upgrade your SDK version (pip install -U sagemaker) for newer image scopes. Supported image scope(s): eia, inference, training.
But so the workaround to avoid that issue appears to be to prefer to push the code to the main branch. It's great to have a way to avoid that craziness so big thanks for that
Hi, @regulatansaku. I'm glad that you've resolved your issue. As for the error about "Unsupported image scope" - check your code that you accidentally don't have
sagemaker.image_uris.retrieve
left anywhere. If you want to retrieve the TensorFlow image with this API, you need to specify either "training" or "inference" as a scope, because these two are slightly different images. But as I said before, you don't need to do that if you use the TensorFlow estimator, and not a generic Estimator.