Sagemaker Training Job. Python modules installation Error

0

I have a problem with Python module installation that requires pre-installation of another module. Both modules were added to the requirement.txt file. However, the error occurs when installing main module:

2022-07-29 01:18:26.460132: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.
"2022-07-29 01:18:26.470589: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped."
2022-07-29 01:18:26.765280: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.
"2022-07-29 01:18:31,908 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training"
"2022-07-29 01:18:31,917 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)"
"2022-07-29 01:18:33,117 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:"
/usr/local/bin/python3.9 -m pip install -r requirements.txt
Collecting Cython==0.29.31
Downloading Cython-0.29.31-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 33.1 MB/s eta 0:00:00
Requirement already satisfied: wheel==0.37.1 in /usr/local/lib/python3.9/site-packages (from -r requirements.txt (line 2)) (0.37.1)
Collecting scikit-image==0.19.2
Downloading scikit_image-0.19.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 83.1 MB/s eta 0:00:00
Collecting parallelbar==0.1.19
Downloading parallelbar-0.1.19-py3-none-any.whl (5.6 kB)
Collecting albumentations==1.0.3
Downloading albumentations-1.0.3-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.7/98.7 kB 6.6 MB/s eta 0:00:00
Collecting tensorflow_addons==0.16.1
Downloading tensorflow_addons-0.16.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 54.4 MB/s eta 0:00:00
Requirement already satisfied: tensorflow-io==0.24.0 in /usr/local/lib/python3.9/site-packages (from -r requirements.txt (line 7)) (0.24.0)
Requirement already satisfied: tensorboard==2.8.0 in /usr/local/lib/python3.9/site-packages (from -r requirements.txt (line 8)) (2.8.0)
Collecting universal-pathlib==0.0.12
Downloading universal_pathlib-0.0.12-py3-none-any.whl (19 kB)
Collecting setuptools==63.2.0
Downloading setuptools-63.2.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 58.9 MB/s eta 0:00:00
Collecting pynanosvg==0.3.1
Downloading pynanosvg-0.3.1.tar.gz (346 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 346.0/346.0 kB 17.5 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
"error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File ""<string>"", line 2, in <module>
        File ""<pip-setuptools-caller>"", line 34, in <module>
        File ""/tmp/pip-install-1mt2gkfy/pynanosvg_d6162ffce95948abb4262061a011908c/setup.py"", line 2, in <module>
          from Cython.Build import cythonize
      ModuleNotFoundError: No module named 'Cython'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip."
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
"note: This is an issue with the package mentioned above, not pip."
hint: See above for details.
[notice] A new release of pip available: 22.1.2 -> 22.2.1
"[notice] To update, run: pip install --upgrade pip"
"2022-07-29 01:18:36,187 sagemaker-training-toolkit INFO     Waiting for the process to finish and give a return code."
"2022-07-29 01:18:36,187 sagemaker-training-toolkit INFO     Done waiting for a return code. Received 1 from exiting process."
"2022-07-29 01:18:36,188 sagemaker-training-toolkit ERROR    Reporting training FAILURE"
"2022-07-29 01:18:36,188 sagemaker-training-toolkit ERROR    InstallRequirementsError:"
ExitCode 1
"ErrorMessage ""      ModuleNotFoundError: No module named 'Cython'
       [end of output]      note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed  × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details."""
"Command ""/usr/local/bin/python3.9 -m pip install -r requirements.txt"""
"2022-07-29 01:18:36,188 sagemaker-training-toolkit ERROR    Encountered exit_code 1"
asked 2 years ago1140 views
1 Answer
0

Since Cython does seem to be downloaded before the error, I suspect the problem is something in other packages' install process requiring it before pip is done installing it. It looks like others have found similar (non-SageMaker-specific) issues with Cython e.g. here and here.

Things I would suggest to try:

  1. Explicitly specify Cython (maybe without a version at first) right at the top of your requirements.txt file if you're not already - just in case this can convince pip to treat it properly.

  2. Customize the TensorFlow container image you're targeting to pre-install Cython.

If you're not sure what base container URI you're using, you can fetch it with sagemaker.image_uris.retrieve(...) (doc here).

From that, you can create a minimal Dockerfile something like

FROM XYZ.dkr.ecr.ABC.amazonaws.com/...
RUN pip install Cython==0.29.31

Once you build this customized container image, and push it to Amazon ECR in your AWS account & region, you can use it by setting the image_uri parameter in your TensorFlow Estimator. Note that the frameworks typically have separate container images for training vs serving, and GPU vs CPU-only, so you may need to create a pair of containers if wanting to do inference too.

If you're working inside SageMaker Studio you won't directly be able to docker build, but you can install the sm-docker build solution based on AWS CodeBuild. The "Prepare custom training and inference containers" section of this notebook gives an example of similar approach.

  1. If you'd really like to avoid touching containers and ECR, you could instead remove your requirements.txt and install dependencies within the script via something like subprocess.check_call(["pip", "install", ...]). It's hacky, but this way you could run a pip install just for Cython first... Then install all the other dependencies in one other command.
AWS
EXPERT
Alex_T
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions