SERVER_PROCESS_FORCE_TERMINATED Event and Testing OnProcessTerminate

0

Hello, I'm using GameLift with Unity 2019.3.0 and Server SDK 9/3/2019 on Linux. So far, it's been enjoyable and fairly straightforward to set up and use! However, I am encountering one issue that I wanted to ask about.

When my fleet instances are terminated, I get the following event in my log:

SERVER_PROCESS_FORCE_TERMINATED: Server process did not cleanly exit within 30 seconds of OnProcessTerminate(), exitCode(137), launchPath(/local/game/MyGame), arguments(-logFile server.log), instanceId(i-0a16afa9b531f160e), publicIP(XX.XXX.XXX.XX), gameSessionId(none)

I remoted into the machine to follow the server log as I played and terminated the instance. It seems like the various SDK callbacks are being called EXCEPT FOR the OnProcessTerminate callback! I have a Debug.Log in each callback to verify that it is being called.

If I set the "desired capacity" of the fleet to zero, the instance shuts down after about 5 minutes and interrupts my SSH connection. But I never see any output to the log about OnProcessTerminate being called. Soon after, the SERVER_PROCESS_FORCE_TERMINATED event appears.

Based on the above data, I was wondering:

  1. Should OnProcessTerminate be called if I manually set the desired capacity? If so, any idea why it would not be called on my end when other callbacks are called?

  2. Besides changing the desired capacity, are there other conditions under which GameLift should call OnProcessTerminate? I have not yet seen GameLift call this function, so I wanted to test it somehow.

  3. Is there any way to test OnProcessTerminate with GameLift Local? I couldn't find a way to get GameLift Local service to call my OnProcessTerminate function. The only way I could find to test this was to use a timer that manually called OnProcessTerminate after X seconds (which seemed to shutdown and exit with code 0 just fine).

I've seen other posts on the forums with a similar problem, but they seemed about a year old or older. Any assistance or insight would be appreciated!

Btw, I tried to attach the C# code I'm using, but it said new users can't upload attachments; here's my code anyway:

using UnityEngine;
using System.Collections;
using System.Collections.Generic;
using Aws.GameLift;
using Aws.GameLift.Server.Model;
using Aws.GameLift.Server;

public class GameLiftServer : MonoBehaviour 
{
    // If true, we know the process is terminating.
    private bool processEndingCalled = false;

    private void Start()
    {
        // Init the SDK or fail.
        GenericOutcome initSdkOutcome = GameLiftServerAPI.InitSDK();
        if(!initSdkOutcome.Success)
        {
            Debug.LogError("SDK Init failed: " + initSdkOutcome.Error.ToString());
            //TODO: Maybe Application.Quit()? Not good if this fails!
            return;
        }

        // List of log files we want GameLift to gather.
        // This assumes that we pass "-logFile server.log" as a command line argument when starting the process.
        LogParameters logParams = new LogParameters(new List<string>() {
            "/local/game/server.log"
        });

        // Params specifies options and callbacks for this process on GameLift.
        ProcessParameters processParams = new ProcessParameters(
            OnGameSessionStarted,
            OnGameSessionUpdated,
            OnProcessTerminated,
            OnHealthCheck,
            7902,
            logParams);
        
        // Call "ProcessReady" to tell GameLift that this process is ready to host a game session!
        GenericOutcome processReadyOutcome = GameLiftServerAPI.ProcessReady(processParams);
        if(!processReadyOutcome.Success)
        {
            Debug.LogError("SDK ProcessReady failed: " + processReadyOutcome.Error.ToString());
            //TODO: Maybe Application.Quit()? Not good if this fails!
            return;
        }
        Debug.Log("Ready to go!");
    }

    private void OnApplicationQuit()
    {
        // If application is quitting, make sure GameLift service knows the process is ending.
        CallProcessEnding();

        // Per GameLift docs, destroy server API on quit.
        GameLiftServerAPI.Destroy();
        Debug.Log("Goodbye, cruel world.");
    }

    private void CallProcessEnding()
    {
        if(!processEndingCalled)
        {
            GenericOutcome processEndingOutcome = GameLiftServerAPI.ProcessEnding();
            if(processEndingOutcome.Success)
            {
                processEndingCalled = true;
                Debug.Log("ProcessEnding called successfully.");
            }
            else
            {
                Debug.LogError("ProcessEnding call failed: " + processEndingOutcome.ToString());
            }
        }
    }

    private void OnGameSessionStarted(GameSession gameSession)
    {
        // When ready to receive incoming player connections, invoke "ActivateGameSession."
        GameLiftServerAPI.ActivateGameSession();
        Debug.Log("Game Session started");
    }

    private void OnGameSessionUpdated(UpdateGameSession updateGameSession)
    {
        Debug.Log("Game Session update: " + updateGameSession.UpdateReason);
    }

    private void OnProcessTerminated()
    {
        Debug.Log("OnProcessTerminated called!");
        CallProcessEnding();
        Debug.Log("Application is quitting...");
        Application.Quit();
    }

    private bool OnHealthCheck()
    {
        return true;
    }

    private bool OnValidatePlayerSession(string playerId)
    {
        GenericOutcome acceptPlayerSessionOutcome = GameLiftServerAPI.AcceptPlayerSession(playerId);
        if(acceptPlayerSessionOutcome.Success)
        {
            Debug.Log("Player Session " + playerId + " accepted.");
            return true;
        }
        else
        {
            Debug.LogError("Player Session " + playerId + " rejected: " + acceptPlayerSessionOutcome.Error.ToString());
            return false;
        }
    }

    private void OnClientDisconnected(ClientInfo clientInfo)
    {
        Debug.Log("Removing player session for " + clientInfo.playerInfo.authToken);
        GameLiftServerAPI.RemovePlayerSession(clientInfo.playerInfo.authToken);
    }

    private void OnPlayEnded()
    {
        Debug.Log("Terminating game session.");
        GameLiftServerAPI.TerminateGameSession();
    }
}
asked 4 years ago435 views
11 Answers
0

Quick answers:

  1. OnProcess terminate should be called when GameLift is killing your server due to (1) process poor health, (2) when an instance is being terminated during a scale-down event, or (3) when an instance is being terminated due to a spot-instance interruption. From https://docs.aws.amazon.com/gamelift/latest/developerguide/integration-server-sdk-cpp-ref-actions.html

  2. See Above

  3. There is no current way to test OnProcessTerminate via GameLift local but I know its the backlog for GameLift local

So you should be getting a call into OnProcessTerminate (based on your error message) so I would check:

  • You are heartbeating / threads are not blocked
  • Ensure your logging is flushed (ie you are up to date)

You should get a call when you aren't heartbeating correctly, so that could be a quicker way to test than manually scaling.

If can you provide your fleet-******

  • Approximate call times when this should have happened
  • Or instance ids that processes were running on
  • Or any other information that helps them pinpoint the host/time etc That would be really helpful
answered 4 years ago
0

Hi, thanks for the quick reply!

I added some log output for the health check and I find that it is being called regularly.

I also tested what happens if I force the health check to return false. After being called seven times, the process was killed and restarted. An event SERVER_PROCESS_TERMINATED_UNHEALTHY was generated. However, it doesn't appear that OnProcessTerminate was executed, as far as I can tell.

As far as I know, the log is flushed. At very least, when a remote client connects and does stuff, the log output appears right away. I'm just using Unity's Debug.Log, which doesn't provide any interface for flushing logs.

If there's any data to glean from the service team, that would be awesome! Here's my info: Fleet ID: fleet-****** Instance ID: i-08f5c746502a6d2e8 (instance was terminated during scale down)

Call Times (PST) on 4/14: 3:11PM - fleet created 3:23PM - changed desired count to 1 3:30PM - server up and running, remoted to instance to watch log; health endpoint called regularly 3:34PM - forced health check to return false 3:40PM - process restarted because of health failure; health check goes back to returning true 3:46PM - set desired count to 0 3:51PM - SSH connection severed; no log output seen for termination 3:51PM - event SERVER_PROCESS_FORCE_TERMINATED generated

Thanks again.

answered 4 years ago
0

Thanks for the update. I've opened a ticket with GameLift support on this. Hope to have an update soon.

answered 4 years ago
0

Looking at the specific instance & fleet, it looks like there were failures on a process with exit code 137 (Out of memory). This occurred at 3:41 (alongside health-check) and 3:51 (when the instance went down).

We don't trigger OnProcessTerminate on a Out of Memory exit code as the process has been reported as terminated.

Do you have matching events in your logs for these out of memory issues?

answered 4 years ago
0

Hi Kenneth, thanks for the reply! I didn't realize that 137 is "Out of Memory" - that provides maybe some clues.

I don't see output in my logs for out of memory issues. I do see in the AWS event that the exit code was 137 though.

I noticed that the event description for SERVER_PROCESS_FORCE_TERMINATED states "server process did not exit cleanly within 30 seconds of OnProcessTerminate()" which implies that GameLift thinks it DID call OnProcessTerminate, but my game didn't respond to it. But based on my logs, I am not getting that callback fired.

I don't quite understand why the exit code is 137. Maybe I'm running out of memory (I can check this out on my end), but that doesn't feel right. This is a very simple prototype game. I am also able to leave a server instance up and running, servicing game sessions, for a pretty long time with no problem. It seems strange that the game would suddenly run out of memory and quit only in scenarios where the process should be terminated (due to health check failure or instance shutting down).

Regarding the health check situation, what I did there was force my game to return false when asked for a health check, which eventually triggered SERVER_PROCESS_TERMINATED_UNHEALTHY. I don't understand how that would suddenly cause the process to end with an "out of memory" issue, especially since the termination must have been triggered by GameLift...but wouldn't GameLift call OnProcessTerminate in that case?

answered 4 years ago
0

This is the timeline from our logs (maybe it'll provide some additional insight):

3:37 - Gamelift agent detected failed health check (as it has been 3 min) and tries to trigger OnProcessTerminate. 3:41 - Gamelift agent force terminates process as process hasn't been terminated. At the same time, relaunches new process to match desired of 1.

From here, it looks like things run normally (with proper health checks) until instance termination.

From the code snippet above, if OnProcessTerminated was triggered, we'd expect a Debug.Log("OnProcessTerminated called!") statement in your logs which isn't the case, correct?

With out of memory issues, it is possible that the process degrades to the point where logging/processing becomes delayed. It does seem unusual that this is only happening for process termination, but it would be the first place I'd suggest looking into. Would it be possible to run a profiler?

answered 4 years ago
0

Hi, thanks again for following up on the issue.

That is correct - I'd expect to see something like "OnProcessTerminated called!" in my log. I also added "Goodbye, cruel world." to the OnApplicationQuit call, which I am also not seeing.

Just to try to verify that it isn't a log flushing issue, I had OnProcessTerminate and OnApplicationQuit both also attempt to write a file to disk using System.IO.File.Create. Those files did not appear on the disk when restarting due to a health check failure, which seems to suggest that OnProcessTerminate/OnApplicationQuit were not called in that situation. I verified that if I call OnProcessTerminate manually, I do see the log output.

To see whether memory was an issue, I added log output of memory stats every 5 seconds. It looks like the game is consistently using about 110MB at the moment, which seems within reason for a machine with 4GB of RAM.

So, it seems that my game is not receiving the callback, even if the GameLift service is attempting to call it in certain circumstances. I'm not sure if that points to a problem in the Server SDK, in my implementation, or elsewhere 😛.

One minor note is that I do see some warnings/errors output when my game is starting up:

log4net:ERROR Exception while reading ConfigurationSettings. Check your .config file is well formed XML. System.PlatformNotSupportedException: Operation is not supported on this platform. at log4net.Util.SystemInfo.GetAppSetting (System.String key) [0x00002] in <9ea670cdbf41450aa8beb1f63c6429db>:0 (this is repeated a bunch of times)

Not sure if that's serious or would inhibit the functionality of the SDK in any way, but it appears to be coming from the log4net system added as part of the GameLift Server SDK. It would be nice to see the log output from the Server SDK, just as a debug aid I suppose!

answered 4 years ago
0

Let's see if we can get the debug logs out from the SDK. As you said, should give us a better sense of what's happening.

I think there just needs to be a configuration for log4net, which should just be in app.config (under Net45 in the SDK). https://aws.amazon.com/blogs/developer/logging-with-the-aws-sdk-for-net/ might help setting it up.

Could you post the logs here once you get them? Thanks!

answered 4 years ago
0

Hi Kenneth, sorry for the delay - I was going crazy trying to get log4net to output log info in my Unity app! Have you guys had success getting this to work on your end?

I added the content from that blog post to the "app.config", but that seemed to have no effect. And anyway, the server code is compiled to a DLL, which is then included in Unity. Unity itself seems to have no way to specify an "app.config" file that I can see - several (unanswered) posts exist on the Unity forums asking about this. If there is some way to use the SDK's log4net in Unity, I'd love to know!

Anyway, to get around this, I simply copied the Server SDK's source files directly into Unity and replaced the log4net calls with Debug.Log calls. I tested this out and got some good data!

I can confirm that that ServerState.OnTerminateProcess is being executed (the log message outputs). I also added a log message within the Task.Run() to ensure that was running correctly. So, my application IS getting a "signal" to terminate.

However, I found that the Task executed in OnTerminateProcess IS NOT executing to completion! Particularly, the call to "JsonConvert.DeserializeObject" appears to never complete or hang in some way. I'm not sure why this would be the case, but I added log output before and after that call. I continually see output before that call, but I don't see any output after that call.

Any idea why that might be? I suppose a workaround could be to simply comment out all the deserialization stuff and call the OnProcessTerminate callback directly. But if there's a way to fix without such a workaround, that seems like a better solution.

Here's the log output from the "Unhealthy" case. Terminate events are received multiple times before the server shuts down:

Reporting health using the OnHealthCheck callback. (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

[GameLiftServer] HealthCheck: detected kill file - server is not healthy! (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Received health response from the server process: False (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Emitting event for message { } (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Got ack True with response { } (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

ServerState got the terminateProcess signal. rawTerminationTime : { "terminationTime": "1587692500" } (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

OnTerminateProcess Task.Run Begin (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

ServerState got the terminateProcess signal. rawTerminationTime : { "terminationTime": "1587692510" } (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

OnTerminateProcess Task.Run Begin (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

ServerState got the terminateProcess signal. rawTerminationTime : { "terminationTime": "1587692520" } (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

OnTerminateProcess Task.Run Begin (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

ServerState got the terminateProcess signal. rawTerminationTime : { "terminationTime": "1587692530" } (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

OnTerminateProcess Task.Run Begin (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

ServerState got the terminateProcess signal. rawTerminationTime : { "terminationTime": "1587692540" } (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

OnTerminateProcess Task.Run Begin (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

ServerState got the terminateProcess signal. rawTerminationTime : { "terminationTime": "1587692550" } (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

OnTerminateProcess Task.Run Begin (Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

And for the "Scaledown" case. Only one terminate event is generated and the server eventually shuts down after a few minutes:

ServerState got the terminateProcess signal. rawTerminationTime : {

"terminationTime": "1587693090"

}

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

OnTerminateProcess Task.Run Begin

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Reporting health using the OnHealthCheck callback.

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

[GameLiftServer] HealthCheck: server is healthy.

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Received health response from the server process: True

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Emitting event for message { "healthStatus": true }

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

(Apologies for the in-line log output, but I am not allowed to upload attachments)

answered 4 years ago
0

Sorry for the delay in response.

Looking at where the error occurs (JsonConvert.DeserializeObject), it looks like the newest version of the C# SDK (4.0.0) doesn't do this anymore. I'd suggest upgrading to the newest version to see if it helps. This was released very recently.

You will need to scale down then up your fleet to pick up the newest version of the on-the-box agent as well.

answered 4 years ago
0

Hi Kenneth, I integrated the latest SDK and was unfortunately still having the same problem. I did the same thing I did above (integrate source directly into Unity, replace log4net calls w/ UnityEngine.Debug.Log calls). I found maybe some interesting info.

I found at first that again ServerState.OnTerminateProcess was beginning execution of Task.Run, but not completing. One thing I tried here was to take the code inside of Task.Run and move it OUTSIDE Task.Run - just run it inline in the OnTerminateProcess function.

After making the above change, I found that an exception was actually triggering (ArgumentOutOfRangeException)! Here's the log of that: Exception.txt|attachment (2.4 KB)

It turns out that the call to "System.DateTime.AddSeconds" is throwing an exception because (I guess) the value being passed is too large? For example, the termination time for me was 1588189228064, which when interpreted as seconds converts to 50,000 years!

It seems like the "terminationTime" value is being interpreted as a number of seconds since Unix epoch, but that doesn't seem to be the case at all. It seems that the value is actually in milliseconds.

After changing it to "AddMilliseconds," everything seems to work correctly. Interestingly, it seems that putting the code in Task.Run may obscure the exception in some way...you only see that something is wrong if you move the code outside Task.Run.

Anyway, seems like I have a fix for my problem, but it may mean that the Server SDK needs to be patched at some point to resolve that problem!

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions