GameLift instance disk space usage reaches 100% after ~1 week : whitewater folder in cause

0

whitewater takes 47G disk space Hello, We experienced repetitive issues on our production real time matches: every week or so, players were not able to play in real time matches anymore. After checking the logs and metrics, we noted that the available game sessions were either dropping to 0 or were not being reported anymore. Our auto-scaling policy is not triggered though since it is based on the percentage of available game session which somehow is showing healthy... (see image below) available sessions vs percent available sessions This also seem like a bug but that's not the one I want to treat in that question.

The Event tab was also flooded with crashing processes. Our workaround is to manually increase the number of desired instance, and eventually, the crashing instance is automatically killed by the system. And we're good for another week or 2 of peace.

Recently, we managed to find out that the issue happens because of storage capacity reached:

10/3/2023 7:52:15 AM [ERROR] Error caught in the beginning of Main!
System.IO.IOException: No space left on device
   at System.IO.FileStream.WriteNative(ReadOnlySpan`1 source)
   at System.IO.FileStream.FlushWriteBuffer()
   at System.IO.FileStream.FlushInternalBuffer()
   at System.IO.FileStream.Flush(Boolean flushToDisk)
   at System.IO.FileStream.Flush()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.IO.StreamWriter.WriteLine(String value)
   at Z3.Gameplay.Backend.ConsoleLogger.Log(String message)
   at Z3.Gameplay.Backend.Program.Main(String[] args)
************************************************

And we know that the folder which takes all the space is whitewater which we don't have access to, and is managed by GameLift, as showed in the first image.

That's why we're asking for your help to help us investigate why this folder ends up taking all the space or understand what can we do on our end to prevent this. Please let us know if we can provide any more information. Thank you.

asked 7 months ago237 views
1 Answer
0

Hi Stephane,

That spike definitely seems concerning.

The Event tab was also flooded with crashing processes

Are you using a debug build? Based on the above statement, my assumption would be that the crash is probably creating a massive core dump which is flooding the disk. Have you tried root causing the crash itself? I would recommend using GameLift Anywhere to test your server on a local machine. This may help identify potential crashes, and memory leaks (that might be in turn causing a massive crash dump).

If you are still facing issues, I would recommend creating a support case so the service team can directly help out here.

Shashank

AWS
answered 6 months ago
profile picture
EXPERT
reviewed a month ago
  • Hi Shashank,

    Thank you for your answer.

    The build is not a debug build, it is a production one. I am afraid the processes crashed due to the disk space being full, and not the opposite.

    The GameLift Anywhere solution to try and reproduce is definitely a good option to investigate, thank you for the tip.

    Otherwise, I have some updates: in the build causing the disk to flood, we were dumping logs on a different file for each match (using the matchId as part of the file name). I must insist that the logs file were not the cause of the disk being flooded (as you can see on the screenshot of the "du" command). But somehow, after stoping writing in different file for each match, the problem have not appeared yet. This makes me believe that you had a good intuiting about memory leaks. Maybe the way we were handling file writing was bad in the opening/close process. I'll try to investigate more if I have the time.

    Thank you so much for your time and suggestions, it was truly appreciated.

    Stephane

  • Actually, I just discovered that the GameLift Server SDK build was a development one. Thank you for your insights, I will change that.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions