- 最新
- 投票最多
- 评论最多
If I understand your log snippets correctly, it looks like your container is failing to respond to any /ping
s while processing the long-running request? Failing to respond to ping for an extended period indicates your endpoint is unhealthy so will signal SageMaker to restart the container.
A likely reason for not responding might be if your request handling uses multi-processing in a way that maxes out all CPUs on the instance? This would leave no cores/threads available to handle to incoming pings while the data is getting processed. In that case, the fix would be to identify what component(s) of your request handling might be using all available system cores at once, and re-configuring them to use int(os.environ["SM_NUM_CPUS"]) - 1
instead.
A similar but less likely reason is if for some reason you're using a fully-custom serving stack or have explicitly re-configured the default one to have only one worker thread: In which case your main request handling might be blocking the server with no threads available to pick up concurrent pings (even though there are CPU resources)?
相关内容
- AWS 官方已更新 1 年前
- AWS 官方已更新 2 年前
- AWS 官方已更新 2 年前
You are right, the problem had been solved. i use a custom docker, which modified from a AWA SageMaker example of real time inference (not async inference). It use gunicorn with one worker and one thread, reason is written in the file named "serve":
After digging all day, I realize it maybe the root cause. I change configure to make gunicorn run with 2 threads, finally it works.