Healthcheck failing on healthy service?

0

I'm getting "Server process exceeded 3 minutes without reporting healthy" fairly regularly from my gamelift event log, but as far as I can tell it is responding "true" in a timely fashion to every healthcheck request received.

I'm running 20 processes on a single c4.large, and the server has plenty of free memory and cpu (generally only a few of the processes actually processing game data at any given time). Load on the server is very low. Yet processes are getting killed by gamelift as unhealthy at a pretty steady clip...

I'm not entirely sure how long each process lives or what causes it to suddenly start failing health checks. The Longest lived of my processes is about 4 hours while some are only a few minutes old.

gefragt vor 5 Jahren424 Aufrufe
2 Antworten
0

I added some logging to the healthcheck, and it's pretty clear that the process is getting a healthcheck every minute, and returning "true" for all of them, but 30s after the last healthy healthcheck the process is flagged as unhealthy and terimanted:

[INFO] [2019-04-29 19:06:22,198] [Thread Pool Worker] [:0] - Healthcheck!
[INFO] [2019-04-29 19:07:22,198] [Thread Pool Worker] [:0] - Healthcheck!
[INFO] [2019-04-29 19:08:22,199] [Thread Pool Worker] [:0] - Healthcheck!
[DEBUG] [2019-04-29 19:08:52,195] [Thread Pool Worker] [:0] - Socket.io event triggered: EVENT_DISCONNECT
[DEBUG] [2019-04-29 19:08:52,229] [Thread Pool Worker] [:0] - ServerState got the terminateProcess signal.  rawTerminationTime : {
  "terminationTime": "1556565202"
}
[WARN] [2019-04-29 19:08:52,317] [Thread Pool Worker] [:0] - 39469 received an onProcessTerminate
[INFO] [2019-04-29 19:08:52,317] [Thread Pool Worker] [:0] - 39469 received an onProcessTerminate
[DEBUG] [2019-04-29 19:08:52,371] [1] [:0] - Socket.io event triggered: EVENT_DISCONNECT
[INFO] [2019-04-29 19:08:52,391] [4] [:0] - 4/29/2019 7:08:52 PM|Warn |WebSocketServer.receiveRequest|Receiving has been stopped.
                             reason: interrupted

In the gamelift event log there was this line for the same time:
2019-04-29 13:08:52 UTC-0600 SERVER_PROCESS_TERMINATED_UNHEALTHY Server process exceeded 3 minutes without reporting healthy

beantwortet vor 5 Jahren
0

After working with AWS GameLift support, It turns out that the problem here was caused by using the 02_15_2018 GameLift Server SDK. Upgrading to the 12_14_2018 Server SDK has fixed the issue.

beantwortet vor 5 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen