Unanswered Questions tagged with Amazon SageMaker

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Error Creating Endpoint

Hi! The following error happens while trying to create an endpoint from a successful trained model: * In the web console: > The customer:primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint. * CloudWatch logs: > exec: "serve": executable file not found in $PATH Im deploying the model using a Lambda step, just as in this [notebook](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/tensorflow2-california-housing-sagemaker-pipelines-deploy-endpoint/tensorflow2-california-housing-sagemaker-pipelines-deploy-endpoint.ipynb). The Lambda step is successful, and I can see in the AWS web console that the model configuration is created with success. The exact same error happens when I create an endpoint for the registered model in the AWS web console, under Inference -> Models. In the console I can see that an inference container was created for the model, with the following characteristics: * Image: 763104351884.dkr.ecr.eu-west-3.amazonaws.com/tensorflow-training:2.8-cpu-py39 * Mode: single model * Environment variables (Key Value): > SAGEMAKER_CONTAINER_LOG_LEVEL 20 > SAGEMAKER_PROGRAM inference.py > SAGEMAKER_REGION eu-west-3 > SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/model/code I absolutely have no clue what is wrong and I could not find anything relevant online about this problem. Is it necessary to provide an custom docker image for inference or something? For more details, please find below the pipeline model steps code. Any help would be much appreciated! ``` model = Model( image_uri=estimator.training_image_uri(), model_data=step_training.properties.ModelArtifacts.S3ModelArtifacts, sagemaker_session=sagemaker_session, role=sagemaker_role, source_dir='code', entry_point='inference.py' ) step_model_create = ModelStep( name="CreateModelStep", step_args=model.create(instance_type="ml.m5.large") ) register_args = model.register( content_types=["*"], response_types=["application/json"], inference_instances=["ml.m5.large"], transform_instances=["ml.m5.large"], model_package_group_name="test", approval_status="Approved" ) step_model_register = ModelStep(name="RegisterModelStep", step_args=register_args) ```
0
answers
0
votes
4
views
profile picture
asked an hour ago

How to Resolve "ERROR execute(301) Failed to execute model:"

We have two applications working on the same AWS Panorama Appliance and processing different video streams. Unfortunately, we are catching the following error. ``` 2022-10-09 21:25:32.360 ERROR executionThread(358) Model 'model': 2022-10-09 21:25:32.359 ERROR execute(301) Failed to execute model: TVMError: '"---------------------------------------------------------------" An error occurred during the execution of TVM. For more information, please see: https://tvm.apache.org/docs/errors.html '"--------------------------------------------------------------- Check failed: (context->execute(batch_size "Stack trace: File "/home/nvidia/neo-ai-dlr/3rdparty/tvm/src/runtime/contrib/tensorrt/tensorrt_runtime.cc", line 177 [bt] (0) /data/cloud/assets/applicationInstance-6ta4fxv6hatsk62pf7aigge36e/a9adc18d31f58ce11dab117a31b7f47e7ee2ab83e04b52c2952ac8cd47b51f72/model/libdlr.so(+0x381358) [0x7f81e66358] [bt] (1) /data/cloud/assets/applicationInstance-6ta4fxv6hatsk62pf7aigge36e/a9adc18d31f58ce11dab117a31b7f47e7ee2ab83e04b52c2952ac8cd47b51f72/model/libdlr.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x88) [0x7f81bb64a0] [bt] (2) /data/cloud/assets/applicationInstance-6ta4fxv6hatsk62pf7aigge36e/a9adc18d31f58ce11dab117a31b7f47e7ee2ab83e04b52c2952ac8cd47b51f72/model/libdlr.so(tvm::runtime::contrib::TensorRTRuntime::Run()+0x12b8) [0x7f81e243b0] [bt] (3) /data/cloud/assets/applicationInstance-6ta4fxv6hatsk62pf7aigge36e/a9adc18d31f58ce11dab117a31b7f47e7ee2ab83e04b52c2952ac8cd47b51f72/model/libdlr.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::json::JSONRuntimeBase::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x5c) [0x7f81e1bfc4] [bt] (4) /data/cloud/assets/applicationInstance-6ta4fxv6hatsk62pf7aigge36e/a9adc18d31f58ce11dab117a31b7f47e7ee2ab83e04b52c2952ac8cd47b51f72/model/libdlr.so(+0x3c0dc4) [0x7f81ea5dc4] [bt] (5) /data/cloud/assets/applicationInstance-6ta4fxv6hatsk62pf7aigge36e/a9adc18d31f58ce11dab117a31b7f47e7ee2ab83e04b52c2952ac8cd47b51f72/model/libdlr.so(+0x3c0e4c) [0x7f81ea5e4c] [bt] (6) /data/cloud/assets/applicationInstance-6ta4fxv6hatsk62pf7aigge36e/a9adc18d31f58ce11dab117a31b7f47e7ee2ab83e04b52c2952ac8cd47b51f72/model/libdlr.so(dlr::TVMModel::Run()+0xc0) [0x7f81c258e0] [bt] (7) /data/cloud/assets/applicationInstance-6ta4fxv6hatsk62pf7aigge36e/a9adc18d31f58ce11dab117a31b7f47e7ee2ab83e04b52c2952ac8cd47b51f72/model/libdlr.so(RunDLRModel+0x1c) [0x7f81bea304] [bt] (8) /usr/lib/libAwsOmniInferLib.so(awsomniinfer::CNeoModel::SNeoModel::execute()+0x3c) [0x7f887db978]" 2022-10-09 21:25:32.437 ERROR executionThread(358) Model 'model': 2022-10-09 21:25:32.437 ERROR setData(279) Failed to set model input 'data': ``` The error isn't persistent. It may happen once in 2-3 weeks, and I need to know which place to investigate. The application logs are in the attachment. I am trying to avoid this issue. However, I would appreciate it if somebody knew how to cook this properly.
0
answers
0
votes
22
views
Rinat
asked a month ago
0
answers
0
votes
18
views
asked a month ago