Troubleshooting Matchmaking

0

Hello,

I'm trying to run GameLift with FlexMatch and a custom game server in C# using .net 3.5 SDK.

After the event FLEET_STATE_ACTIVATING I get several SERVER_PROCESS_CRASHED, then a FLEET_STATE_ACTIVE and FLEET_SCALING_EVENT.

After that several SERVER_PROCESS_CRASHED including a SERVER_PROCESS_TERMINATED_UNHEALTHY after a while.

Is there a way to debug these errors? Is there a log file somewhere?

For example, this is one error

SERVER_PROCESS_CRASHED Server process exited without calling ProcessEnding(), exitCode(134), launchPath(/local/game/GameServer), arguments(null), instanceId(i-0945e4522bdd8f0cb), publicIP(34.219.22.237), gameSessionId(none), occurrences(216 for this instance in a 5 minute period)

I'm able to log in to that instance using ssh, the game folder doesn't contain any log. The game session is never created, so that log doesn't exist neither. There's nothing in s3.

Please help, thanks!

Edit1:

I was somehow able to progress a little, I'm getting a matchmaking ticket created, it goes from QUEUED, to PENDING and then FAILED.

    aws gamelift describe-matchmaking --ticket-ids 656a5ac8-a497-4d87-9b1d-14cafdf45b15 --region us-west-2
{
"TicketList": [
{
"TicketId": "656a5ac8-a497-4d87-9b1d-14cafdf45b15",
"ConfigurationName": "machinal-matchmaking-config-bot",
"Status": "FAILED",
"StatusReason": "UNEXPECTED_ERROR",
"StatusMessage": "An unexpected error was encountered during match placing.",
"StartTime": 1558976856.971,
"EndTime": 1558976859.674,
"Players": [
{
"PlayerId": "763",
"PlayerAttributes": {
"score": {
"N": 0.0
}
},
"Team": "player",
"LatencyInMs": {
"us-west-2": 420
}
}
]
}
]
}

A game session is being created, it ends with state TERMINATED. The game session log is empty.

Game Server Logs 0 File(s) Collected/local/game/sessionLog

I'm clueless, anyone, any help would be really appreciated. Thanks!

Edit2:

I'm still struggling to integrate FlexMatch, but I did some progress with the debugging, I'll share it here because it could be useful for others.

Just as a general advice, always use full paths to the files.

I have a run.sh

#!/bin/bash
# the exec command here is important, if not you'll get that the InitSDK is being called from an unknown process
exec /local/game/GameServer >> /tmp/machinal_stdout.txt 2>> /tmp/machinal_stderr.txt

Then there's the install.sh

#!/bin/bash
# Install packages using yum
sudo yum update -y && \
sudo yum -y install libunwind.x86_64 libcurl-devel.x86_64 openssl-devel.x86_64 && \
sudo yum clean all
# This way you set env variables for all users
echo 'PORT="9050"' | sudo tee -a /etc/environment
# If you don't add execution permissions, the fleet will end in ACTIVATING state for an hour and then ERROR
sudo chmod a+x /local/game/run.sh
# If you don't create this directory, the following code will fail.
# var outcome = GameLiftServerAPI.InitSDK();
# var logParameters = new LogParameters(new List<string> { "sessionLogs" });
# var processParameter = new ProcessParameters(OnStartGameSession, OnProcessTerminate, OnHealthCheck, port, logParameters);
# outcome = GameLiftServerAPI.ProcessReady(processParameter);
mkdir /local/game/sessionLogs

Remember to change the launch path.

    aws gamelift create-fleet \
...
--runtime-configuration "ServerProcesses=[{LaunchPath=/local/game/run.sh,
ConcurrentExecutions=1}]" \
...

If you set ConcurrentExecutions greater than one, multiple process will write to the same log files, I recommend starting with one, and when everything is fine increase it.

Finally, you'll have to login via ssh to the instance and look the logs /tmp/machinal_stdout.txt and /tmp/machinal_stderr.txt

Edit3:

I'm getting the following exception...

   Unhandled Exception: System.AggregateException ---> System.PlatformNotSupportedException: Operation is not supported on this platform.
at Aws.GameLift.Server.ProcessParameters.OnHealthCheckDelegate.BeginInvoke(AsyncCallback callback, Object object)
at Aws.GameLift.Server.ServerState.ReportHealth()
at Aws.GameLift.Server.ServerState.<StartHealthCheck>b__32_0()
at System.Threading.Tasks.Task.ThreadStart()
--- End of inner exception stack trace ---
at System.Threading.Tasks.TaskExceptionSlot.Finalize()

Any thoughts? Thanks.

Edit4:

Instead of building the GameLift-CSharp-ServerSDK-3.3.0 NET35 project and including the dll, I tried including the source code in my project. I used dotnet add package to add the same packages used in the GameliftServerSDK project to mine.

I ended up with this compilation error:

error CS0433: The type 'Task<TResult>' exists in both 'System.Threading.Tasks.NET35, Version=3.0.2.0, Culture=neutral, PublicKeyToken=null' and 'System.Runtime, Version=4.2.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'

Then I tried using GameLift-CSharp-ServerSDK-3.3.0 NET 45, including the source code in my project. I was able to do a build and deploy it.

I ended up with this runtime error:

Code:

var outcome = GameLiftServerAPI.InitSDK();
Debug.Log(outcome.Success + " " + outcome.Error);
var logParameters = new LogParameters(new List<string> { "sessionLogs" });
var processParameter = new ProcessParameters(OnStartGameSession, OnProcessTerminate, OnHealthCheck, port, logParameters);
outcome = GameLiftServerAPI.ProcessReady(processParameter);
Debug.Log(outcome.Success + " " + outcome.Error);

Output:

False [GameLiftError: ErrorType=LOCAL_CONNECTION_FAILED, ErrorName=Local connection failed., ErrorMessage=Connection to local agent could not be established.]
False [GameLiftError: ErrorType=NETWORK_NOT_INITIALIZED, ErrorName=Network not initialized., ErrorMessage=Local network was not initialized. Have you called InitSDK()?]

Web:

SERVER_PROCESS_SDK_INITIALIZATION_TIMEOUT

This error doesn't make much sense, the ServerState.InitializeNetworking code is similar between NET35 and NET45 versions.

Also, the Gamelift process seems to be up and running in the instance

sudo netstat -plnt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
...
tcp 0 0 127.0.0.1:5757 0.0.0.0:* LISTEN 2804/java
...

Then I tried with the previous version GameLift-CSharp-ServerSDK-3.2.1 NET45.

This version seems to be working correctly.

asked 5 years ago350 views
11 Answers
0

Hello @REDACTEDUSER

I think you may be still having problems with the health check because you are starting the run.sh process, and this itself calls the executable, which is a child process of the run.sh file. The health check is unable to locate the entry point in the executable process because it is a child process of the run.sh, and you get the error you mention in Edit3.

Try specifying the game server directly in the --runtime-configuration parameter for the fleet, and see if that fixes you.

You should be sure to be on the newest version of the SDK, as the old one has a bug in it that will eventually cause a problem. Anyone not on the latest version should upgrade as soon as possible.

Thank you,

Al Murray :)

answered 5 years ago
0

Hello @REDACTEDUSER

I have GameLift-CSharp-ServerSDK-3.2.1 NET45 running fine.

I think that the health check works because the exec command doesn't create a child process but replaces the current thread.

The error I get with GameLift-CSharp-ServerSDK-3.3.0 NET 45 (latest version) is this one


False[GameLiftError:ErrorType=LOCAL_CONNECTION_FAILED,ErrorName=Local connection failed.,ErrorMessage=Connection to local agent could not be established.]
False[GameLiftError:ErrorType=NETWORK_NOT_INITIALIZED,ErrorName=Networknot initialized.,ErrorMessage=Local network was not initialized.Have you called InitSDK()?]

My understanding is that this error

UnhandledException:System.AggregateException--->System.PlatformNotSupportedException:Operationisnot supported on this platform.

is related with using GameLift-CSharp-ServerSDK-3.3.0 NET35.

answered 5 years ago
0

I tried again with the latest version, GameLift-CSharp-ServerSDK-3.3.0 NET 45, and I cant' get it to work. The fleet stays in the ACTIVATING state for an hour and then it goes to the ERROR state.

I tried with run.sh as well as specifying the game server directly in the --runtime-configuration parameter for the fleet.

answered 5 years ago
0

Well, I could ran the latest version, yay! This last problem seems to be related with concurrent_executions 50, and not with the sdk version.

answered 5 years ago
0

Hey Jonas

Try to log into the instance with my remote login tool. This link to the tool is good until 2019-12-01. I can update it periodically.

Let me know!

Al Murray :)

answered 5 years ago
0

Excellent. Yes, you started fifty servers on your instance, which made it unstable. That would do it!

Thank you,

Al Murray :)

answered 5 years ago
0

It seems to work with 40 but not with 50. Is there anyway I can debug this? It doesn't let me to log in to the instance.

Edit1:

With 40 it worked on us-east-1, us-west-2, sa-east-1. But it didn't work on eu-central-1.

Edit2:

I need someone to explain me how to debug this thing, how to write to the logs, and where those logs are, it's driving me crazy, and it's really making me think twice about using GameLift.

Case of fleet activating for an hour and then error, where's the log, what's going on?

If the instance doesn't start, where's the log?

If the instance starts but the activation fails, where's the log?

The game session logs are always empty, why is that?

Case after start_matckmaking call, ticket ends placing forever, where's the log? Why is this happening?

Case game server throws an unexpected exception, where's the log?

Case I want to print debug information, how do I do it, where's the log?

Case I lack permission to write/read somewhere, there's some missing file, where's the log?

Without using the run.sh script, none of these errors I've been debugging would come up, how do you expect someone to do this? Is it just me having all this problems? Is this not the intended use? What's going on guys?

answered 5 years ago
0

Al, thanks for the tool.

I'm able to connect via ssh using the terminal, is this tool something different, do you have a version for mac?

Where can I find the logs once logged in via ssh?

/local/game/sessionLogs is empty

Where can I find the standard output and standard error? I used the run.sh script to do this, do you have other way of doing this?

When the instance is not initialized at all, there's no instance to login, like the case of running 50 concurrent executions, where I can find the log of what happened?

answered 5 years ago
0

Hey Jonas,

We don't write logs to the instance. Standard output and standard error streams are not written to the disk by the process either. Use this to log output to Cloudwatch Logs

If you have too many concurrent processes, the instance will thrash until it hangs, due to resource over utilization. Reduce the number of concurrent processes you are requesting. Don't forget to scale the fleet to 0, then back up for the change to reflect on all fleet instances.

Thank you,

Al Murray :)

Solutions Architect.

answered 5 years ago
0

Hey Jonas,

We don't write logs to the instance. Standard output and standard error streams are not written to the disk by the process either. Use this to log output to Cloudwatch Logs

If you have too many concurrent processes, the instance will thrash until it hangs, due to resource over utilization. Reduce the number of concurrent processes you are requesting. Don't forget to scale the fleet to 0, then back up for the change to reflect on all fleet instances.

Thank you,

Al Murray :)

Solutions Architect.

answered 5 years ago
0

Just commenting to say that even though its almost one year later, this post helped me immensely! Specially the "hack" of run.sh so we can force everything to be output somewhere... makes me wonder why this is not the default, why does gamelift not output the logs somewhere accessible in the same way when calling the server executable... even if those logs could only be seen by remote access, that would already be extremely helpful...

Also, the one hour to go into error state is really painful... I am still using the free tier, learning the system and evaluating if its a good fit, so I dont want to be starting multiple fleets (1 is the limit), so that in practice means I launch a fleet, it fails, but I can only try my fixes over 1 hour later... the workflow to learn the system so far has been very clunky. Maybe its because I am not an expert at infrastructure and what not but... I had hoped it would be easier.

At any rate thanks for the tips here!

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions