This is a duplicate of a question I asked on stack overflow
I am serving a sagemaker model through a custom docker container using the guide that AWS provides. This is a docker container that runs a simple nginx->gunicorn/wsgi->flask server
I am facing an issue where my transform requests time out around 30 minutes in all instances, despite should being able to continue to 60 minutes. I need requests to be able to go to sagemaker maximum of 60 minutes due to data intense nature of request.
Through experience working with this setup for some months, I know that there are 3 factors that should affect the time my server has to respond to requests:
- Sagemaker itself will cap invocations requests according to the
InvocationsTimeoutInSeconds
paremeter set when creating the batch
transform
job.
- The
nginx.conf
file must be configured such that keepalive_timeout
, proxy_read_timeout
, proxy_send_timeout
, and proxy_connect_timeout
are all equal or greater than maximum timeout
- gunicorn server must its timeout configured to be equal or greater than maximum timeout
I have verified that when I create my batch transform job InvocationsTimeoutInSeconds
is set to 3600 (1 hour)
My nginx.conf looks like this:
worker_processes 1;
daemon off; # Prevent forking
pid /tmp/nginx.pid;
error_log /var/log/nginx/error.log;
events {
# defaults
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log /var/log/nginx/access.log combined;
sendfile on;
client_max_body_size 30M;
keepalive_timeout 3920s;
upstream gunicorn {
server unix:/tmp/gunicorn.sock;
}
server {
listen 8080 deferred;
client_max_body_size 80m;
keepalive_timeout 3920s;
proxy_read_timeout 3920s;
proxy_send_timeout 3920s;
proxy_connect_timeout 3920s;
send_timeout 3920s;
location ~ ^/(ping|invocations) {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://gunicorn;
}
location / {
return 404 "{}";
}
}
}`
I start the gunicorn server like this:
def start_server():
print('Starting the inference server with {} workers.'.format(model_server_workers))
print('Model server timeout {}.'.format(model_server_timeout))
# link the log streams to stdout/err so they will be logged to the container logs
subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])
nginx = subprocess.Popen(['nginx', '-c', '/opt/program/nginx.conf'])
gunicorn = subprocess.Popen(['gunicorn',
'--timeout', str(3600),
'-k', 'sync',
'-b', 'unix:/tmp/gunicorn.sock',
'--log-level', 'debug',
'-w', str(1),
'wsgi:app'])
signal.signal(signal.SIGTERM, lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid))
# If either subprocess exits, so do we.
pids = set([nginx.pid, gunicorn.pid])
while True:
pid, _ = os.wait()
if pid in pids:
break
sigterm_handler(nginx.pid, gunicorn.pid)
print('Inference server exiting')
Despite all this, whenever a transform job takes longer than approx 30 minutes I will see this message in my logs and the transform job status becomes failed:
2023/01/07 08:23:14 [error] 11#11: *4 upstream prematurely closed connection while reading response header from upstream, client: 169.254.255.130, server: , request: "POST /invocations HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/invocations", host: "169.254.255.131:8080"
I am close to thinking there is a bug in AWS batch transform, but perhaps I am missing some other variable (perhaps in the nginx.conf) that could lead to premature upstream termination of my request.