GPU Passthrough
Configure graphical workstation with GPU passthrough
The GPU passthrough option allows attaching an Nvidia or AMD GPU to an instance. Only one instance can use the GPU and the GPU cannot be used by the host node. Using GPU passthrough allows for use cases that require a powerful GPU for data processing. It also allows using GPU passthrough along with USB passthrough to host a workstation from a Pritunl Cloud server with full desktop performance.
Configure Host UEFI
The host must be installed with UEFI, this can be verified by checking that ls /sys/firmware/efi
exists.
Configure VFIO Device on Host
Currently secure boot is not supported and must be disabled for GPU passthrough. This can be done in the bios. Run the command below to verify secure boot is disabled.
sudo mokutil --sb-state
The GPU must first be removed from the host and configured as a VFIO device. Before doing this CPU hardware virtualization extensions and IOMMU must be enabled in the BIOS. There are multiple names used for these options. Virtualization extensions are often labeled Intel VT-x or AMD-V. The IOMMU option is often labeled Intel VT-d or AMD-Vi. Once enabled edit the Grub configuration at /etc/default/grub
and add the IOMMU options. For Intel add intel_iommu=on iommu=pt video=efifb:off
for AMD add amd_iommu=on iommu=pt video=efifb:off
. The efifb
video module should also be disabled to prevent the module from interfering with the guest. These options must be appended to the existing GRUB_CMDLINE_LINUX
options. Do not remove any existing options from the GRUB_CMDLINE_LINUX
. It is also recommended to enable nested KVM with the second command below.
sudo nano /etc/default/grub
# APPEND OPTIONS BELOW TO EXISTING OPTIONS
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt video=efifb:off"
sudo tee /etc/modprobe.d/kvm_intel.conf << EOF
options kvm_intel nested=1
EOF
sudo nano /etc/default/grub
# APPEND OPTIONS BELOW TO EXISTING OPTIONS
GRUB_CMDLINE_LINUX="amd_iommu=on iommu=pt video=efifb:off"
sudo tee /etc/modprobe.d/kvm_amd.conf << EOF
options kvm-amd nested=1
EOF
After the configuration is updated run the command below to update the Grub configuration.
sudo grub2-mkconfig -o /etc/grub2-efi.cfg
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Once done restart the server and the command below will return a list of IOMMU groups. If it does not check the output of dmesg | grep 'IOMMU enabled'
and verify the BIOS configuration.
sudo bash << EOF
#!/bin/bash
shopt -s nullglob
for g in /sys/kernel/iommu_groups/*; do
echo "IOMMU Group \${g##*/}:"
for d in \$g/devices/*; do
echo -e "\t\$(lspci -nns \${d##*/})"
done;
done;
EOF
The hardware design of the server will group PCI devices into IOMMU groups. When configuring the GPU for passthrough all devices in the group must passthrough to the instance. If the group contains devices other then the GPU the PCI cards may need to be placed in different slots. There will likely be a PCI bridge
device in the same group with the GPU, this will not effect the configuration and this device should be ignored. The GPU IOMMU group will look similar to the example below. The GPU will often have an audio device that will also be configured for passthrough.
IOMMU Group 1:
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107GL [Quadro P1000] [10de:1cb1] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
In this example an Nvidia Quadro card is used. The ID of the GPU is 10de:1cb1
and the ID of the audio device is 10de:0fb9
. These IDs can be found in the output above at the end of the device line. Update the commands below to use the IDs of your device from the command output above. There are two parts below that will need to be updated with the IDs of your device. Separate each device ID with a comma. The GRUB_CMDLINE_LINUX
options below must again be appended to the existing options including the IOMMU options added above.
sudo nano /etc/default/grub
# APPEND OPTIONS BELOW TO EXISTING OPTIONS
GRUB_CMDLINE_LINUX=" vfio-pci.ids=10de:1cb1,10de:0fb9"
sudo tee /etc/modules-load.d/vfio.conf << EOF
vfio
vfio_pci
vfio_virqfd
vfio_iommu_type1
EOF
sudo tee /etc/modprobe.d/nouveau.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF
sudo tee /etc/modprobe.d/vfio.conf << EOF
options vfio-pci ids=10de:1cb1,10de:0fb9
softdep radeon pre: vfio-pci
softdep amdgpu pre: vfio-pci
softdep nouveau pre: vfio-pci
softdep nvidiafb pre: vfio-pci
softdep drm pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
options kvm_amd avic=1
EOF
Once done rebuild the Linux boot image with these new module options using the command below.
sudo dracut -f
Once done restart the server and continue with the Pritunl Cloud configuration below.
Enable Host GPU Passthrough
In the node options enable PCI Passthrough. If USB devices such as a mouse and keyboard are required also enable USB Passthrough. Then click Save.
Add Devices to Instance
After the node is configured add the PCI devices and any USB devices to the instance. PCI devices are stored by slot ID, if the any PCI devices are added or removed from the host node this slot ID will likely change and will need to be updated. USB devices are stored by vendor and device ID, these devices can be connected, removed or moved to different USB ports while the instance is running. Duplicate devices with the same ID may not work.
Configure Oracle Linux 8 Workstation
The sections below will configure a Gnome workstation on the Oracle Linux 8 instance image. By default the graphical desktop environment is not installed on the Pritunl Cloud images. The commands in each section will install this environment group and enable the instance to start with the graphical environment. It is recommended to use the Nvidia DKMS configuration which will allow using the Oracle Linux UEK Kernel.
Configure Oracle Linux 8 Workstation AMD
The commands below will configure the workstation environment for an AMD GPU. The passwd command will allow setting a password for the cloud user to login from the graphical login screen. The root password should also be set.
sudo yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
sudo yum -y groupinstall "Workstation"
sudo systemctl set-default graphical.target
sudo systemctl enable gdm
sudo passwd root
sudo passwd cloud
Configure Oracle Linux 8 Workstation Nvidia DKMS
This configuration will use the driver binary from Nvidia, the next section has instructions for using the RPM Fusion repository. This method will support the Oracle Linux UEK Kernel.
Go to the Nvidia Driver Download page and select NVIDIA RTX / Quadro, then select the correct GPU. Select Linux 64-bit and Linux Short Lived as the Download Type. Then click the Search on the next page click the first Download button then copy the URL of the next Download button. Update the Nvidia URL below with the latest driver. When installing the driver select Yes on the first prompt to register the kernel with DKMS. Use the defaults for all other prompts.
The passwd command will allow setting a password for the cloud user to login from the graphical login screen. The root password should also be set. The last DKMS status command should show an Nvidia module.
sudo yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
sudo yum -y install dkms kernel-devel kernel-uek-devel make gcc wget libglvnd libglvnd-egl libglvnd-glx libglvnd-gles libglvnd-opengl libglvnd-devel
sudo yum -y groupinstall "Workstation"
sudo tee /etc/modprobe.d/nouveau-blacklist.conf << EOF
blacklist nouveau
EOF
sudo dracut -f
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/455.45.01/NVIDIA-Linux-x86_64-455.45.01.run
sudo sh ./NVIDIA-Linux-x86_64-455.45.01.run
# Select Yes to register the kernel module with DKMS
sudo systemctl set-default graphical.target
sudo systemctl enable gdm
sudo passwd root
sudo passwd cloud
sudo dkms status
#nvidia, 455.45.01, 5.4.17-2036.101.2.el8uek.x86_64, x86_64: installed
The commands below can be used in the future to update the Nvidia driver. The graphical system will need to first be stopped before installing the driver. After the driver is installed enable the graphical target.
sudo systemctl set-default multi-user.target
sudo reboot
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/455.45.01/NVIDIA-Linux-x86_64-455.45.01.run
sudo sh ./NVIDIA-Linux-x86_64-455.45.01.run
# Select Yes to register the kernel module with DKMS
sudo systemctl set-default graphical.target
sudo reboot
Configure Oracle Linux 8 Workstation Nvidia RPM Fusion
This configuration will use the driver binary from Nvidia, the section above has instructions for using the Nvidia binary. This method will not support the Oracle Linux UEK Kernel.
First disable the UEK kernel by changing the default kernel to the latest kernel version in /boot
that does not have uek
in the name. After rebooting into the standard kernel, removing the UEK kernel.
ls /boot
sudo grubby --set-default /boot/vmlinuz-4.18.0-240.10.1.el8_3.x86_64
sudo reboot
sudo dnf remove kernel-uek
The commands below will configure the workstation environment using the RPM Fusion repository for Nvida drivers. The passwd command will allow setting a password for the cloud user to login from the graphical login screen. The root password should also be set.
sudo yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
sudo yum -y groupinstall "Workstation"
sudo yum -y install https://download1.rpmfusion.org/free/el/rpmfusion-free-release-8.noarch.rpm
sudo yum -y install https://download1.rpmfusion.org/nonfree/el/rpmfusion-nonfree-release-8.noarch.rpm
sudo yum -y install kmod-nvidia nvidia-settings xorg-x11-drv-nvidia xorg-x11-drv-nvidia-cuda-libs xorg-x11-drv-nvidia-kmodsrc xorg-x11-drv-nvidia-libs xorg-x11-drv-nvidia-cuda
sudo systemctl set-default graphical.target
sudo systemctl enable gdm
sudo passwd root
sudo passwd cloud
The graphical desktop instances tend to have a long shutdown time. When stopping these instances the command sudo poweroff
should be used instead of stopping the instance from the Pritunl Cloud web console which may timeout.
Updated over 3 years ago