By using AWS re:Post, you agree to the Terms of Use

Questions tagged with AWS Deep Learning Containers

Sort by most recent
  • 1
  • 12 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

SageMaker MultiDataModel deployment error during inference. ValueError: Exactly one .pth or .pt file is required for PyTorch models: []

Hello, I've been trying to deploy multiple PyTorch models on one endpoint on SageMaker from a SageMaker Notebook. First I tested deployment of single models on single endpoints, to check if everything works smoothly and it did. I would create a PyTorchModel first: ``` import sagemaker from sagemaker.pytorch import PyTorchModel from sagemaker import get_execution_role from sagemaker.multidatamodel import MultiDataModel from sagemaker.serializers import JSONSerializer from sagemaker.deserializers import JSONDeserializer import boto3 role = get_execution_role() sagemaker_session = sagemaker.Session() pytorch_model = PyTorchModel( entry_point='inference.py', source_dir='code', role=role, model_data='s3://***/model/model.tar.gz', framework_version='1.11.0', py_version='py38', name='***-model', sagemaker_session=sagemaker_session ) ``` MultiDataModel inherits properties from Model classes, so I used the same PyTorch model that I used for single model deployment. Then I would define the MultiDataModel the following way: ``` models = MultiDataModel(name='***-multi-model', model_data_prefix='s3://***-sagemaker/model/', model=pytorch_model, sagemaker_session=sagemaker_session ) ``` All it should need is the prefix to the S3 bucket of the model artifacts saved as tar.gz files (the same files used for single model deployment), the previously defined PyTorch model, a name and a sagemaker_session. To deploy it: ``` models.deploy(initial_instance_count =1, instance_type='ml.m4.xlarge', serializer=JSONSerializer(), deserializer=JSONDeserializer(), endpoint_name='***-multi-model-deployment', ) ``` The deployment goes well, as there are no failures and the endpoint is InService by the end of this step. However the error occurs when I try to run inference on one of the models: ``` import json body = {"url":"https://***image.jpg"} #url to an image online payload = json.dumps(body) client = boto3.client('sagemaker-runtime') response = client.invoke_endpoint( EndpointName = "***-multi-model-deployment", ContentType = "application/json", TargetModel = "/model.tar.gz", Body = payload) ``` This prompts an error message: ``` ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{ "code": 500, "type": "InternalServerException", "message": "Failed to start workers for model ec1cd509c40ca81ffc3fb09deb4599e2 version: 1.0" } ". See https://***.console.aws.amazon.com/cloudwatch/home?region=***#logEventViewer:group=/aws/sagemaker/Endpoints/***-multi-model-deployment in account ***** for more information. ``` The Cloudwatch logs show this error in particular: ``` 22-09-26T15:51:40,494 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 210, in <module> 2022-09-26T15:51:40,494 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - worker.run_server() 2022-09-26T15:51:40,494 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 181, in run_server 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.handle_connection(cl_socket) 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 139, in handle_connection 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service, result, code = self.load_model(msg) 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 104, in load_model 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service = model_loader.load( 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_loader.py", line 151, in load 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - initialize_fn(service.context) 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_pytorch_serving_container/handler_service.py", line 51, in initialize 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - super().initialize(context) 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_inference/default_handler_service.py", line 66, in initialize 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self._service.validate_and_initialize(model_dir=model_dir) 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_inference/transformer.py", line 162, in validate_and_initialize 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self._model = self._model_fn(model_dir) 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_pytorch_serving_container/default_pytorch_inference_handler.py", line 73, in default_model_fn 2022-09-26T15:51:40,495 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - raise ValueError( 2022-09-26T15:51:40,496 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - ValueError: Exactly one .pth or .pt file is required for PyTorch models: [] ``` It seems like it's having problems loading the model, saying only one .pth file is required, however in the invocation function i point to the exact model artifact present at that S3 bucket prefix. I'm having a hard time trying to fix this issue, so it would be very helpful if anyone had some suggestions! Instead of giving the MultiDataModel a model, I also tried providing it an ECR docker image with the same inference code, but I would get the same error during invocation of the endpoint.
1
answers
0
votes
34
views
asked 9 days ago

SKLearn Processing Container - Error: "WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager."

Hey all, I am trying to run the script below in the writefile titled "vw_aws_a_bijlageprofile.py". This code has worked for me using other data sources, but now I am getting the following error message from the CloudWatch Logs: "***2022-08-24T20:09:19.708-05:00 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv***" Any idea how I get around this error? Full code below. Thank you in advance!!!! ``` %%writefile vw_aws_a_bijlageprofile.py import os import sys import subprocess def install(package): subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package]) install('awswrangler') install('tqdm') install('pandas') install('botocore') install('ruamel.yaml') install('pandas-profiling') import awswrangler as wr import pandas as pd import numpy as np import datetime as dt from dateutil.relativedelta import relativedelta from string import Template import gc import boto3 from pandas_profiling import ProfileReport client = boto3.client('s3') session = boto3.Session(region_name="eu-west-2") def run_profile(): query = """ SELECT * FROM "intl-euro-archmcc-database"."vw_aws_a_bijlage" ; """ #swich table name above tableforprofile = wr.athena.read_sql_query(query, database="intl-euro-archmcc-database", boto3_session=session, ctas_approach=False, workgroup='DataScientists') print("read in the table queried above") print("got rid of missing and added a new index") profile_tblforprofile = ProfileReport(tableforprofile, title="Pandas Profiling Report", minimal=True) print("Generated table profile") return profile_tblforprofile if __name__ == '__main__': profile_tblforprofile = run_profile() print("Generated outputs") output_path_tblforprofile = ('/opt/ml/processing/output/profile_vw_aws_a_bijlage.html') #switch profile name above print(output_path_tblforprofile) profile_tblforprofile.to_file(output_path_tblforprofile) ``` ``` import sagemaker from sagemaker.processing import ProcessingInput, ProcessingOutput session = boto3.Session(region_name="eu-west-2") bucket = 'intl-euro-uk-datascientist-prod' prefix = 'Mark' sm_session = sagemaker.Session(boto_session=session, default_bucket=bucket) sm_session.upload_data(path='vw_aws_a_bijlageprofile.py', bucket=bucket, key_prefix=f'{prefix}/source') ``` ``` import boto3 #import sagemaker from sagemaker import get_execution_role from sagemaker.sklearn.processing import SKLearnProcessor region = boto3.session.Session().region_name S3_ROOT_PATH = "s3://{}/{}".format(bucket, prefix) role = get_execution_role() sklearn_processor = SKLearnProcessor(framework_version='0.20.0', role=role, sagemaker_session=sm_session, instance_type='ml.m5.24xlarge', instance_count=1) ``` ``` sklearn_processor.run(code='s3://{}/{}/source/vw_aws_a_bijlageprofile.py'.format(bucket, prefix), inputs=[], outputs=[ProcessingOutput(output_name='output', source='/opt/ml/processing/output', destination='s3://intl-euro-uk-datascientist-prod/Mark/IODataProfiles/')]) ```
1
answers
0
votes
34
views
asked a month ago

Setting up data for DeepAR, targets and categories for simultaneous data?

I would like to try out DeepAR for an engineering problem that I have some sensor datasets for, but I am unsure how to set it up for ingestion into DeepAR to get a predictive model. The data is essentially the positions, orientations, and a few other timeseries sensor readings of an assortment of objects (animals, in this case, actually) over time. Data is both noisy and sometimes missing. So, in this case, there are N individuals and for each individual, there are Z variables of interest per individual. None of the variables are "static" (color, size, etc), they are all expected to be time-varying on the same time scale. Ultimately, I would like to try and predict all Z targets for all N individuals. How do I set up the timeseries to feed into DeepAR? The premise is that all these individuals are implicitly interacting in the observed space, so all the target values have some interdependence on each other, which is what I would like to see if DeepAR can take into account to make predictions. Should I be using a category vector of length 2, such that the first cat variable corresponds to the individual, and the second corresponds to one of the variables associated with the individual? Then there would be N*Z targets in my input dataset, each with `cat = [ n , z ]`, where there are N distinct values for n, and z for Z?
1
answers
0
votes
68
views
asked 7 months ago
  • 1
  • 12 / page