Healthcheck failing on healthy service?

0

I'm getting "Server process exceeded 3 minutes without reporting healthy" fairly regularly from my gamelift event log, but as far as I can tell it is responding "true" in a timely fashion to every healthcheck request received.

I'm running 20 processes on a single c4.large, and the server has plenty of free memory and cpu (generally only a few of the processes actually processing game data at any given time). Load on the server is very low. Yet processes are getting killed by gamelift as unhealthy at a pretty steady clip...

I'm not entirely sure how long each process lives or what causes it to suddenly start failing health checks. The Longest lived of my processes is about 4 hours while some are only a few minutes old.

asked 5 years ago419 views
2 Answers
0

I added some logging to the healthcheck, and it's pretty clear that the process is getting a healthcheck every minute, and returning "true" for all of them, but 30s after the last healthy healthcheck the process is flagged as unhealthy and terimanted:

[INFO] [2019-04-29 19:06:22,198] [Thread Pool Worker] [:0] - Healthcheck!
[INFO] [2019-04-29 19:07:22,198] [Thread Pool Worker] [:0] - Healthcheck!
[INFO] [2019-04-29 19:08:22,199] [Thread Pool Worker] [:0] - Healthcheck!
[DEBUG] [2019-04-29 19:08:52,195] [Thread Pool Worker] [:0] - Socket.io event triggered: EVENT_DISCONNECT
[DEBUG] [2019-04-29 19:08:52,229] [Thread Pool Worker] [:0] - ServerState got the terminateProcess signal.  rawTerminationTime : {
  "terminationTime": "1556565202"
}
[WARN] [2019-04-29 19:08:52,317] [Thread Pool Worker] [:0] - 39469 received an onProcessTerminate
[INFO] [2019-04-29 19:08:52,317] [Thread Pool Worker] [:0] - 39469 received an onProcessTerminate
[DEBUG] [2019-04-29 19:08:52,371] [1] [:0] - Socket.io event triggered: EVENT_DISCONNECT
[INFO] [2019-04-29 19:08:52,391] [4] [:0] - 4/29/2019 7:08:52 PM|Warn |WebSocketServer.receiveRequest|Receiving has been stopped.
                             reason: interrupted

In the gamelift event log there was this line for the same time:
2019-04-29 13:08:52 UTC-0600 SERVER_PROCESS_TERMINATED_UNHEALTHY Server process exceeded 3 minutes without reporting healthy

answered 5 years ago
0

After working with AWS GameLift support, It turns out that the problem here was caused by using the 02_15_2018 GameLift Server SDK. Upgrading to the 12_14_2018 Server SDK has fixed the issue.

answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions