Making sense of Gamelift cpu utilization graph

0

Hi all!

I'm hoping someone can help me make sense of this graph. I'm trying to see how CPU utilization changes as more game sessions get squeezed onto the same server instance. Here, I opened up about 50 sessions over the course of like 15 minutes.

REMOVEDUPLOAD

The red line is game sessions, and the brown line is "CPU Utilization." The brown line is what's confusing me.

Usually, when I think of CPU utilization, I think of it being basically 0% when nothing is happening, and then it creeps up to 100% as more things happen. And, in fact, that is what I saw last night when I did a similar test using a different instance type.

But here, it starts around 80% and drops to 30% as we reach 50 game sessions. Then it goes back up to 80% the moment the game sessions are end. It seems to directly conflict with what I saw last time.

I suppose what possibility is that by CPU utilization, Amazon really means CPU "availability," so that might explain why the value seems inversely related to the number of game sessions. But if that were the case, then why does it hover at 80% and 100% when it's just sitting idle with no game sessions? And furthermore, why did this seem to be the opposite behavior as what I witnessed during my previous test?

Thanks so much in advance for any light you can shed on this issue!

asked 3 years ago484 views
18 Answers
0

Here's the data from that early test I mentioned.

REMOVEDUPLOAD

As you can see, CPU utilization goes up as game sessions increase, flatlining at 100%. This makes more sense to me.

It's kinda weird though that at 0 game sessions, CPU utilization is already at 50%. Why not 0%?

answered 3 years ago
0

[quote="TheoTowers, post:1, topic:10829"] But here, it starts around 80% and drops to 30% as we reach 50 game sessions. Then it goes back up to [/quote]

Thanks for reaching out. I'll cut a ticket for the GameLift team to investigate. The second graph which seems correct, is that from your CloudWatch console?

answered 3 years ago
0

Oh excellent, thank you!!

It's captured from the "Metrics" tag from my Fleet console in GameLift. I don't believe I have CloudWatch set up just yet.

answered 3 years ago
0

Would you mind providing your fleet id to help us investigate?

Also, you don't actually need to setup CloudWatch, just select the region where your fleet is located in, go into the CloudWatch console (search for "CloudWatch" in the AWS console), and the GameLift metrics should be there. It's very handy if you want to build a customized dashboard, viewing metrics externally from AWS Console, etc.

answered 3 years ago
0

Oh, shoot, I terminated the fleet before I went to bed to avoid racking up the bill! I don't think I have a record of the fleet number either.

I'll start a new fleet now and redo the test. Then I'll let you know once it's done!

answered 3 years ago
0

@REDACTEDUSER

I just did another test with a new Fleet and basically go the same type of thing going on, so at least we know it's consistent. I'll PM you with the fleet ID in a minute.

REMOVEDUPLOAD

Once again, cpu utilization starts at 80% and then goes down to around 30% as game sessions reach the max. Then, at the 11:56 mark when I stop all game sessions, cpu utilization immediately shoots back up to 80%.

answered 3 years ago
0

@REDACTEDUSER

answered 3 years ago
0

I guess I need to unlock the ability to DM :thinking:

answered 3 years ago
0

@REDACTEDUSER

answered 3 years ago
0

Thanks, I got it, sorry for the trouble. I've added the fleet id in the investigation ticket.

P48493534

answered 3 years ago
0

Fantastic, thank you!!

answered 3 years ago
0

Hey @REDACTEDUSER

The CPU metrics for the fleet provided are coming directly from the ec2 instance your Game Servers are running on. Unfortunately, I'm unable to see a breakdown of what's causing the high CPU usage without access to the instance itself.

Are you able to SSH onto the instance and run the top command to see which processes are taking up all the CPU? https://docs.aws.amazon.com/gamelift/latest/developerguide/fleets-remote-access.html

It may just be that the 50 processes running on the fleet are too much for the given instance-type, but you're right that it strange that CPU decreases with more Active GameSessions.

answered 3 years ago
0

Hi @REDACTEDUSER

I also took a look at the metrics on our end, and I agree with Nathan's assessment:

  • I've verified the CPU Utilization metrics from both EC2 and AutoScaling, they both show the same trend -- ~77% at 0 Active Game Sessions, 30% at ~46 Active Game Sessions. Meaning that there is no bug on the console, we are relaying the exact metrics from EC2
  • Try to SSH into the instance (see recommended approach in https://forums.awsgametech.com/t/unity-server-built-with-il2cpp-fails/10835/2?u=jamesm_aws), and run top commands to see what is consuming so much CPU even without ACTIVE game sessions? (NOTE: even though your instance doesn't have ACTIVE game sessions, it still runs 50 processes that are waiting for game sessions to be started on them.)
  • Seems like the process uses less CPU when game session is activated on it. Are the processes doing anything compute-heavy while waiting for game sessions? E.g. short polling anything?
  • My other suspicion was that your processes were crashing as game sessions were being connected to it, which reduced the overall CPU utilization; however, that suspicion is not true because the ActiveGameSession metrics were seemingly stable and I didn't see any AbnormalProcessTermination metrics.
answered 3 years ago
0

Thanks @REDACTEDUSER

answered 3 years ago
0

Here's what I'm seeing when I use the top command. 'spoopy-server' is the name of my build, so I'm guessing my build is just idling filling up CPU usage for no good reason. I'm going to check this out in the Unity profiler next.

REMOVEDUPLOAD

answered 3 years ago
0

Looking at my code again and reading your response above, @REDACTEDUSER

I was under the impression that a process started when a client created a game session, and then should be terminated once the game session is complete. However, it seems that's not how it's supposed to work. Instead, a process starts up at the beginning and gets used for sessions indefinitely?

If that's correct, that would make more sense from a resource usage perspective. It would also explain why my processes are using so much of the CPU - they're constantly starting and stopping with my code that attempts to prevent zombie sessions from being created!

answered 3 years ago
0

Ok, so after making the corrections, my graph now looks like this:

REMOVEDUPLOAD

Which is infinitely better haha. It's odd though that we're still hovering around 30% just waiting for sessions, but I can continuing investigating to see what the cause of that is.

Thanks so much for helping me realize the fatal mistake I was making!!

answered 3 years ago
0

Hey Theo,

Glad you figured out the issue! Let us know if there's stuff we can help with for the 30%, best of luck.

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions