Amazon DCV fails to create sessions on EC2 (Ubuntu 22.04 + g4dn + Isaac Sim Setup)

0

Hi everyone,

I'm currently setting up an EC2 instance (g4dn.xlarge) for a simulation project using NVIDIA Isaac Sim. The application requires a fully functional GUI environment using Amazon NICE DCV, along with NVIDIA drivers and Ubuntu 22.04 LTS.

However, I'm running into persistent issues with DCV sessions not initializing properly. Here's the setup:


Setup

  • Instance type: g4dn.xlarge
  • AMI used: ami-07fed9dd6a6618ff1 (Canonical / Ubuntu 22.04)
  • Kernel: 6.8.0-1027-aws
  • GUI: XFCE4
  • Installed:
    • NVIDIA Driver 535.xx
    • NICE DCV 2024.x
    • Xserver, LightDM, xinit
  • Command used to start session:
    dcv create-session ubuntu --type=virtual --owner=ubuntu --init="/usr/bin/startxfce4"
    
      Result:
    
      dcv list-sessions
      > There are no sessions available.

Problem

dcvserver.service starts and stays active.

The session creation command exits with no error, but no session ever shows up.

The journalctl logs show nothing unusual – no crash, no warnings.

I've tested with different init scripts (xterm, startx, startxfce4), all result in the same behavior.

Hypothesis

It seems likely that the issue is due to kernel incompatibility. The linux-image-aws kernel (6.8.x) may not be supported by DCV or the NVIDIA GRID driver stack.

What I need help with:

Is there a known issue between DCV and Ubuntu 22.04 with linux-image-aws (6.8.0-1027)?

Is there a stable AMI with the 5.15.0-generic kernel that is compatible with g4dn and DCV?

What's the best practice to ensure DCV + GPU compatibility on EC2 for simulation workflows?

Thank you in advance — I'd appreciate any help or insights!

Best regards, Satchel

asked a month ago83 views
2 Answers
0

Based on the information provided, there are a few potential issues and solutions to consider for your Amazon DCV session creation problem on EC2:

  1. Kernel Compatibility: Your hypothesis about kernel incompatibility could be correct. The 6.8.x kernel you're using might not be fully supported by DCV or the NVIDIA GRID driver stack. It's generally recommended to use a more stable, long-term support kernel version for better compatibility.

  2. Session Type: Make sure you're using the correct session type. For multiple sessions, you should use virtual sessions instead of console sessions. Your command seems correct in using "--type=virtual", but it's worth double-checking.

  3. X Server Configuration: Ensure that your X server is properly configured to work with the NVIDIA driver. Check your /etc/X11/xorg.conf file to make sure it's set up correctly for the NVIDIA GPU.

  4. DCV-GL Package: If you're using GPU acceleration, make sure you have the DCV-GL package installed and properly configured.

  5. Permissions and Access: Verify that DCV has the necessary permissions to access the X server. You might need to run 'dcvgladmin disable', then 'dcvgladmin enable', and restart the X server.

  6. Logging: Check additional log files for more detailed information. Look at /var/log/dcv/sessionlauncher.log, /var/log/dcv/dcv-session.user.session.log, and $HOME/.xsession-errors for any error messages or clues.

  7. Failsafe Session: Try creating a failsafe session to isolate whether the issue is with the session manager or DCV itself.

  8. AMI Selection: Consider using an AWS-provided NVIDIA GPU-optimized AMI, which often comes pre-configured with compatible drivers and kernels for GPU instances.

To address your specific questions:

  1. While there's no widely reported issue between DCV and Ubuntu 22.04 with linux-image-aws (6.8.0-1027), using a more stable, tested kernel version could help.

  2. AWS does provide NVIDIA GPU-optimized AMIs that are compatible with g4dn instances and typically work well with DCV. These AMIs often use the 5.15 kernel series, which might be more stable for your use case.

  3. Best practices for ensuring DCV + GPU compatibility on EC2 for simulation workflows:

    • Use AWS-provided GPU-optimized AMIs when possible
    • Stick to LTS kernel versions
    • Ensure all NVIDIA drivers, CUDA toolkit, and DCV components are up-to-date and compatible
    • Configure the X server properly for NVIDIA GPUs
    • Use virtual sessions for multi-user scenarios
    • Regularly check AWS and NVIDIA documentation for updates and known issues

If the problem persists after trying these suggestions, you may need to contact AWS support for more specific assistance with your configuration.

Sources
Troubleshooting Virtual Session Creation on Linux - Amazon DCV
Investigating Virtual Session Creation Failure on Linux - NICE DCV
Could not create session. The maximum number of sessions has been reached | AWS re:Post
DCV cannot access the 3D X Server [:0.0 :0.1]. | AWS re:Post

profile picture
answered a month ago
0

Try running nvidia-smi to verify that NVIDIA drivers are installed and working fine.

To install NVIDIA drivers on Ubuntu 22.04, you can refer to article How do I install NVIDIA GPU driver, CUDA Toolkit, NVIDIA Container Toolkit on Amazon EC2 instances running Ubuntu Linux?, which installs drivers from NVIDIA repository. You are probably missing linux-headers-aws and linux-modules-extra-aws packages among others. DCV usually have better compatibility with GNOME.

There is also a CloudFormation template which can help automate the NVIDIA driver and DCV installation (below screenshot). Enter image description here

Using the following settings

  • instanceType: g4dn.xlarge
  • osVersion: Ubuntu 22.04 (x86_64)
  • sessionType: console-with-NVIDIA_runfile_Driver
  • teslaDriverVersion: 535.247.01
  • installDocker: Yes

You may want to specify large volumeSize

Alternatively, you can use AWS Deep Learning AMIs, which has NVIDIA drivers pre-installed.

AWS
EXPERT
answered 12 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions