Issues with exposing SKLearn model as endpoint on AWS Sagemaker

2

I have tried to expose a SKLearn model as an endpoint after training it on AWS sagemaker, and I get another error related to hosting the endpoint: "sagemaker.exceptions.UnexpectedStatusException: Error hosting endpoint sagemaker-scikit-learn-2022-12-26-21-19-15-169: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

Process finished with exit code 1"

After taking a look at the CloudWatch logs, as this message suggests, it seems like there's an issue with a missing file related to a server that needs to be initialized for the endpoint to work : "2022/12/26 21:23:12 [crit] 27#27: *63 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.178.2, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping ", host: "169.254.180.2:8080""

Does anyone happen to know of any way of resolving this? I have looked online for quick fixes but have not found any solutions, and I feel it has something to do with sagemaker not installing the right packages from docker or something like that - Unfortunately, I don't know of any way of reproducing this error locally.

The deployment script can be found in "deploy.py" in my github repo: https://github.com/abhinavGirish/newsy

Any help/insight is much appreciated - I've been struggling with this issue for about a week. Thank you!

**** EDIT 12/28/23 ****

So I've tried a different approach now using one of AWS's other tutorials for deploying endpoints (Amazon might have unfortunately taken down documentation for it unfortunately) which is now in the create_endpoint_script.py file in the https://github.com/abhinavGirish repository. I now get the following error when I go into CloudWatch logs:

Traceback (most recent call last): File "/miniconda3/lib/python3.8/site-packages/gunicorn/workers/base_async.py", line 55, in handle self.handle_request(listener_name, req, client, addr) File "/miniconda3/lib/python3.8/site-packages/gunicorn/workers/ggevent.py", line 143, in handle_request super().handle_request(listener_name, req, sock, addr) File "/miniconda3/lib/python3.8/site-packages/gunicorn/workers/base_async.py", line 106, in handle_request respiter = self.wsgi(environ, resp.start_response) File "/miniconda3/lib/python3.8/site-packages/sagemaker_sklearn_container/serving.py", line 140, in main user_module_transformer, execution_parameters_fn = import_module(serving_env.module_name, File "/miniconda3/lib/python3.8/site-packages/sagemaker_sklearn_container/serving.py", line 118, in import_module user_module = importlib.import_module(module_name) File "/miniconda3/lib/python3.8/importlib/init.py", line 118, in import_module if name.startswith('.'):

AttributeError: 'NoneType' object has no attribute 'startswith'

To resolve this issue, I had made sure to check the versions of scikit-learn i was using to see if they were compatible with sagemaker, upgrade the version of sagemaker i was using to deploy the endpoint, and also put in the path of the serving script for the model. Does anyone have any idea what else I could try to fix this model deployment issue?

3 Answers
0

unix:/tmp/gunicorn.sock failed (2: No such file or directory) may indicate Gunicorn was not started properly.

If you could search log for other errors related to Gunicorn, you may be able to identify the root cause.

Also, you could host the endpoint locally in your device or SageMaker Notebook Instance, where you could login to the container to further debug the issue. Local Mode details https://sagemaker.readthedocs.io/en/stable/overview.html#local-mode

AWS
answered a year ago
  • thanks for your response. I tried hosting the endpoint locally, and I run into another error involving an missing module 'yaml': " sagemaker_session = LocalSession() File "/Users/abhinavgirish/.local/share/virtualenvs/NewsClassifierProject/lib/python3.6/site-packages/sagemaker/local/local_session.py", line 513, in init super(LocalSession, self).init(boto_session) File "/Users/abhinavgirish/.local/share/virtualenvs/NewsClassifierProject/lib/python3.6/site-packages/sagemaker/session.py", line 131, in init sagemaker_featurestore_runtime_client=sagemaker_featurestore_runtime_client, File "/Users/abhinavgirish/.local/share/virtualenvs/NewsClassifierProject/lib/python3.6/site-packages/sagemaker/local/local_session.py", line 560, in _initialize raise e File "/Users/abhinavgirish/.local/share/virtualenvs/NewsClassifierProject/lib/python3.6/site-packages/sagemaker/local/local_session.py", line 557, in _initialize import yaml ModuleNotFoundError: No module named 'yaml'"

    After searching for a bit online, (including https://github.com/yaml/pyyaml/issues/291) , I've tried re-installing the library multiple ways: pip install pyyaml python3 -m pip install pyyaml sudo easy_install pip sudo python -m pip install pyyaml pip install pyyaml==5.1.0

    I checked and made sure that the 'import yaml' statement works in my Mac's python interpreter. Anything else I could try? perhaps it works only for interpreter version 3.8 and not 3.6 (called above). Thanks!

  • ^^^ anyone know how to fix the above issue? trying to fix my original issue i posted about while running in 'LocalSession' mode but no luck yet- for some reason, the pyyaml module is installed for the version 3.9 of python (what normally gets pulled up when i open the python interpreter in a terminal), but when I run the script for a local session in PyCharm, it ends up running python version 3.6. not finding too many relevant answers on stackoverflow.

    I have the script available to deploy locally on github here: https://github.com/abhinavGirish/newsy/blob/main/deploy_local.py. Here's the exact error message as well:

    Failed to import yaml. Local mode features will be impaired or broken. Please run "pip install 'sagemaker[local]'" to install all required dependencies. Traceback (most recent call last): File "/Users/abhinavgirish/Documents/2021-2022/Newsy/deploy_local.py", line 4, in <module> sagemaker_session = LocalSession() File "/Users/abhinavgirish/.local/share/virtualenvs/NewsClassifierProject/lib/python3.6/site-packages/sagemaker/local/local_session.py", line 513, in init super(LocalSession, self).init(boto_session) File "/Users/abhinavgirish/.local/share/virtualenvs/NewsClassifierProject/lib/python3.6/site-packages/sagemaker/session.py", line 131, in init sagemaker_featurestore_runtime_client=sagemaker_featurestore_runtime_client, File "/Users/abhinavgirish/.local/share/virtualenvs/NewsClassifierProject/lib/python3.6/site-packages/sag

  • running "pip install 'sagemaker[local]' " as the error message suggests doesn't work either, FYI

  • update - it looks like I got docker to run after running my local deploy script - now the issue is that the nginx server for the endpoint isn't launching properly. After looking through stack overflow posts online it seems there's some problem with the configuration of nginx when attempting to launch the endpoint:

    fphpsiduo3-algo-1-knq70 | 2023/02/07 18:22:38 [crit] 25#25: *39 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 172.18.0.1, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "localhost:8080"

    does anyone know if there's a way to resolve this through the aws management console? I don't have any docker files or nginx config in my repo (see link i posted above) so I'm not sure there's some settings I need to fix on my AWS account or if it's something else. Any help is much appreciated - thanks again!

0

Hi,

I agree it seems some packages are missing based on the error messages. I did not use SKLearn with sagemaker personally, but I just reviewed a few examples of sagemaker SDK for SKLearn.

That’s just my two cents - maybe try to remove the source_dir in deploy.py and try again.

Thanks,

AWS
Jady
answered a year ago
  • thanks for answering Jady - I put in the 'source_dir' for that function because that's the directory where the requirements.txt file sits, which contains the list of packages that needs to be installed for deployment. Unfortunately, if I removed it, the model would not train properly on sagemaker and thus i can't take it out - i ran into that issue before.

0

Checking out quickly your code, I can see a couple of issues with it that would definitely be raising errors.

The way that you are doing the deployment (calling the .deploy() method) you are neither passing the requirements file nor your custom inference script. Therefore, my guess is that when SageMaker tried to unpickle your model it encounters an error that it could not unpickle your LemmaTokenizer and/or nltk package not found.

Therefore, instead of your https://github.com/abhinavGirish/newsy/blob/083c88668f972b2c6d8fd35c9d511f40de682221/deploy.py#L28 you want to do something like:

my_model = SKLearnModel(
entry_point="inference.py", # "aws_training.py" in your case
source_dir="code_location", # "./" for you, a directory that contains your inference script and your requirements.txt file
...
)
my_model.deploy(...)

reference: https://github.com/aws/sagemaker-python-sdk/blob/e5ccfa72eec5b039813b59b62f5332e7b03489d4/src/sagemaker/sklearn/model.py#L73

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions