GameLift instance disk space usage reaches 100% after ~1 week : whitewater folder in cause

0

whitewater takes 47G disk space Hello, We experienced repetitive issues on our production real time matches: every week or so, players were not able to play in real time matches anymore. After checking the logs and metrics, we noted that the available game sessions were either dropping to 0 or were not being reported anymore. Our auto-scaling policy is not triggered though since it is based on the percentage of available game session which somehow is showing healthy... (see image below) available sessions vs percent available sessions This also seem like a bug but that's not the one I want to treat in that question.

The Event tab was also flooded with crashing processes. Our workaround is to manually increase the number of desired instance, and eventually, the crashing instance is automatically killed by the system. And we're good for another week or 2 of peace.

Recently, we managed to find out that the issue happens because of storage capacity reached:

10/3/2023 7:52:15 AM [ERROR] Error caught in the beginning of Main!
System.IO.IOException: No space left on device
   at System.IO.FileStream.WriteNative(ReadOnlySpan`1 source)
   at System.IO.FileStream.FlushWriteBuffer()
   at System.IO.FileStream.FlushInternalBuffer()
   at System.IO.FileStream.Flush(Boolean flushToDisk)
   at System.IO.FileStream.Flush()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.IO.StreamWriter.WriteLine(String value)
   at Z3.Gameplay.Backend.ConsoleLogger.Log(String message)
   at Z3.Gameplay.Backend.Program.Main(String[] args)
************************************************

And we know that the folder which takes all the space is whitewater which we don't have access to, and is managed by GameLift, as showed in the first image.

That's why we're asking for your help to help us investigate why this folder ends up taking all the space or understand what can we do on our end to prevent this. Please let us know if we can provide any more information. Thank you.

1개 답변
0

Hi Stephane,

That spike definitely seems concerning.

The Event tab was also flooded with crashing processes

Are you using a debug build? Based on the above statement, my assumption would be that the crash is probably creating a massive core dump which is flooding the disk. Have you tried root causing the crash itself? I would recommend using GameLift Anywhere to test your server on a local machine. This may help identify potential crashes, and memory leaks (that might be in turn causing a massive crash dump).

If you are still facing issues, I would recommend creating a support case so the service team can directly help out here.

Shashank

AWS
답변함 6달 전
profile picture
전문가
검토됨 2달 전
  • Hi Shashank,

    Thank you for your answer.

    The build is not a debug build, it is a production one. I am afraid the processes crashed due to the disk space being full, and not the opposite.

    The GameLift Anywhere solution to try and reproduce is definitely a good option to investigate, thank you for the tip.

    Otherwise, I have some updates: in the build causing the disk to flood, we were dumping logs on a different file for each match (using the matchId as part of the file name). I must insist that the logs file were not the cause of the disk being flooded (as you can see on the screenshot of the "du" command). But somehow, after stoping writing in different file for each match, the problem have not appeared yet. This makes me believe that you had a good intuiting about memory leaks. Maybe the way we were handling file writing was bad in the opening/close process. I'll try to investigate more if I have the time.

    Thank you so much for your time and suggestions, it was truly appreciated.

    Stephane

  • Actually, I just discovered that the GameLift Server SDK build was a development one. Thank you for your insights, I will change that.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠