I am using the pre-build Scikit container in Sagemaker to deploy an endpoint based on a model that contains a 59.4 MB model.tar.gz file. The following line was used to deploy the endpoint:
sm_model.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge", endpoint_name=endpoint_name)
However, the after the endpoint was created, it fails to allocate memory to works. These error messages and warnings keep showing in the logs:
[Errno 12] Cannot allocate memory
[WARNING] Worker with pid 242 was terminated due to signal 9
As far as I know, the xlarge instance has 16 GB of memory. The endpoint memory usage is at 60% while it still fails to allocate memory to workers. May I ask if anyone has any insight on why this is happening and how to solve this issue without using an instance that has more memory?