AWS SageMaker - Extending Pre-built Container, Deploy Endpoint Failed. No such file or directory: 'serve'"

0

I am trying to deploy the SageMaker Inference Endpoint by extending the Pre-built image. However, it failed with "FileNotFoundError: [Errno 2] No such file or directory: 'serve'"

My Dockerfile

ARG REGION=us-west-2

# SageMaker PyTorch image
FROM 763104351884.dkr.ecr.$REGION.amazonaws.com/pytorch-inference:1.12.1-gpu-py38-cu116-ubuntu20.04-ec2

RUN apt-get update

ENV PATH="/opt/ml/code:${PATH}"

# this environment variable is used by the SageMaker PyTorch container to determine our user code directory.
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code

# /opt/ml and all subdirectories are utilized by SageMaker, use the /code subdirectory to store your user code.
COPY inference.py /opt/ml/code/inference.py

# Defines inference.py as script entrypoint 
ENV SAGEMAKER_PROGRAM inference.py

CloudWatch Log From /aws/sagemaker/Endpoints/mytestEndpoint

2022-09-30T04:47:09.178-07:00
Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 20, in <module>
    subprocess.check_call(shlex.split(' '.join(sys.argv[1:])))
  File "/opt/conda/lib/python3.8/subprocess.py", line 359, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/opt/conda/lib/python3.8/subprocess.py", line 340, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/opt/conda/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/conda/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
Traceback (most recent call last): File "/usr/local/bin/dockerd-entrypoint.py", line 20, in <module> subprocess.check_call(shlex.split(' '.join(sys.argv[1:]))) File "/opt/conda/lib/python3.8/subprocess.py", line 359, in check_call retcode = call(*popenargs, **kwargs) File "/opt/conda/lib/python3.8/subprocess.py", line 340, in call with Popen(*popenargs, **kwargs) as p: File "/opt/conda/lib/python3.8/subprocess.py", line 858, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/opt/conda/lib/python3.8/subprocess.py", line 1704, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename)

2022-09-30T04:47:13.409-07:00
FileNotFoundError: [Errno 2] No such file or directory: 'serve'
profile picture
질문됨 일 년 전650회 조회
2개 답변
1

Hi, @holopekochan!

The serve script is installed by SageMaker PyTorch Inference Toolkit when you pip-install it in the Dockerfile.

However, it's hard to say why it's not found in your container. Are you sure that you use the inference container, not training container, for your endpoint? If you go to the AWS Console > Amazon SageMaker > Models > your model, what ECR image it shows in Container 1 - Image?

It will be useful if you can share the code that you used to setup the SageMaker PyTorch estimator (if any) how you define your PyTorchModel and how you deploy() it.

profile pictureAWS
Ivan
답변함 일 년 전
  • Thanks for your hint. I found the problem. I used wrong docker image, EC2 one instead of SageMaker. I simplify my question and provide the answer here. Just in case, if someone get the same problem.

  • Hi, @holopekochan! Good catch! I'm glad that you have found it!

0
수락된 답변

Should use the Sagemaker image

763104351884.dkr.ecr.$REGION.amazonaws.com/pytorch-inference:1.12.1-gpu-py38-cu116-ubuntu20.04-sagemaker

instead of ec2

763104351884.dkr.ecr.$REGION.amazonaws.com/pytorch-inference:1.12.1-gpu-py38-cu116-ubuntu20.04-ec2
profile picture
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인