NVIDIA Driver Installation Guide
= NVIDIA Driver Installation Guide =
This guide provides detailed instructions for installing NVIDIA drivers on Ubuntu, Debian, and CentOS systems. Proper driver installation is crucial for leveraging the full potential of NVIDIA GPUs, particularly for AI/ML workloads, scientific computing, and high-performance graphics.
Prerequisites
Before you begin, ensure you have:
# A server with an NVIDIA GPU installed. GPU servers are available at Immers Cloud starting from $0.23/hr for inference to $4.74/hr for H200. # Root or sudo privileges on your server. # Internet connectivity to download necessary packages. # Basic familiarity with the Linux command line. # The specific model of your NVIDIA GPU. You can usually find this using:
lspcigrep -i nvidia # Important: Ensure your system is up-to-date. # For Ubuntu/Debian:sudo apt update && sudo apt upgrade -y# For CentOS:sudo yum update -yStep 1: Identify Your GPU and Kernel Version
Knowing your GPU model and kernel version helps in selecting the correct driver.
# Check GPU:
lspcigrep -i nvidia Example Output:01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2070] (rev a1)# Check Kernel Version:
uname -rExample Output:5.15.0-56-genericStep 2: Install Necessary Build Tools
The NVIDIA driver installation often requires kernel headers and build tools to compile modules for your specific kernel.
For Ubuntu/Debian
sudo apt install build-essential linux-headers-$(uname -r) -yFor CentOS
# CentOS uses `kernel-devel` for kernel headers.sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc make -yWhy this matters: The NVIDIA driver is a kernel module. To load and function correctly, it needs to be compiled against the exact headers of your running kernel. Missing these will lead to a non-functional driver.
Step 3: Disable Nouveau Driver
The Nouveau driver is an open-source driver for NVIDIA cards that can interfere with the proprietary NVIDIA driver installation. It's essential to disable it.
# Blacklist Nouveau: Create a new configuration file:
sudo nano /etc/modprobe.d/blacklist-nouveau.confAdd the following lines:blacklist nouveau options nouveau modeset=0Save and exit (Ctrl+X, Y, Enter in nano).# Update initramfs: # For Ubuntu/Debian:
sudo update-initramfs -u# For CentOS:sudo dracut --force# Reboot your system:
sudo rebootWhy this matters: The Nouveau driver might try to claim the GPU, preventing the NVIDIA driver from doing so. Blacklisting ensures it's not loaded at boot.
Step 4: Install NVIDIA Driver
There are generally two recommended methods: using the distribution's package manager or downloading the driver from NVIDIA's website. Using the distribution's repository is often simpler and better integrated.
Method 1: Using Distribution Repositories (Recommended)
This is the easiest and most stable method for most users.
For Ubuntu/Debian
# Add the graphics-drivers PPA (Personal Package Archive) for newer drivers:sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt update# Find the recommended driver:
ubuntu-drivers devicesExample Output:/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0
driver : xorg-driver-video-nvidia-470 - distro non-free recommended driver : xorg-driver-video-nvidia-510 - distro non-free driver : xorg-driver-video-nvidia-515 - distro non-free driver : xorg-driver-video-nvidia-450 - distro non-free driver : xorg-driver-video-nvidia-390 - distro non-free driver : xorg-driver-video-nvidia-525 - distro non-free driver : xorg-driver-video-nvidia-495 - distro non-free driver : xorg-driver-video-nvidia-535 - distro non-free driver : xorg-driver-video-nvidia-545 - distro non-free driver : xorg-driver-video-nvidia-550 - distro non-free driver : xorg-driver-video-nvidia-555 - distro non-free driver : xorg-driver-video-nvidia-560 - distro non-free ... Use the recommended driver:sudo ubuntu-drivers autoinstallAlternatively, install a specific version (e.g., 535):sudo apt install nvidia-driver-535 -y# Reboot your system:
sudo rebootFor CentOS
# CentOS usually has drivers available in the EPEL (Extra Packages for Enterprise Linux) or RPM Fusion repositories. # Install EPEL if not already present:sudo yum install epel-release -y# Install RPM Fusion (for NVIDIA drivers):sudo yum install --nogpgcheck https://download1.rpmfusion.org/free/el/rpmfusion-free-release-$(rpm -E %rhel).noarch.rpm https://download1.rpmfusion.org/nonfree/el/rpmfusion-nonfree-release-$(rpm -E %rhel).noarch.rpm -y# Install the NVIDIA driver: # Search for available drivers:
sudo yum search nvidia-driver# Install the latest available driver (e.g., `akmod-nvidia` for kernel modules):sudo yum install akmod-nvidia xorg-x11-drv-nvidia-cuda -y# Reboot your system:
sudo rebootWhy this matters: Using distribution repositories ensures that the driver is compatible with your system's libraries and kernel. `akmod-nvidia` on CentOS automatically rebuilds the kernel module when the kernel is updated.
Method 2: Using NVIDIA's Runfile Installer
This method offers the latest drivers directly from NVIDIA but can be more complex to manage, especially during kernel updates.
# Visit the NVIDIA Driver Download page. # Select your GPU model, operating system, and download the latest recommended driver. # Make the downloaded file executable:
chmod +x NVIDIA-Linux-x86_64-*.run# Stop your display manager. This is crucial to prevent conflicts. # For Ubuntu/Debian (using `gdm3` or `lightdm`):
sudo systemctl stop gdm3orsudo systemctl stop lightdm# For CentOS (using `gdm`):sudo systemctl stop gdm# Run the installer:
sudo ./NVIDIA-Linux-x86_64-*.run# Follow the on-screen prompts. It's generally recommended to accept the default options, including installing the 32-bit compatibility libraries if prompted. # The installer might ask to register the kernel module with DKMS (Dynamic Kernel Module Support). It's usually best to say 'yes' if available for easier updates.# Restart your display manager and reboot:
sudo systemctl start gdm3(or your display manager)sudo rebootWhy this matters: Stopping the display manager ensures the X server is not running, which is necessary for the driver installer to properly load and configure the graphics components. DKMS helps manage kernel module updates automatically.
Step 5: Verify Installation
After rebooting, verify that the NVIDIA driver is loaded correctly.
# Check NVIDIA SMI (System Management Interface):
nvidia-smiExample Output:+-----------------------------------------------------------------------------+NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 | ----------------------------------------+-------------------------------------+ GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | =
=
=
==+=
=
=
|0 NVIDIA GeForce RTX 2070 Off | 00000000:01:00.0 On | N/A | N/A 45C P8 8W / N/A | 1MiB / 8192MiB | 0% Default | +-----------------------------------------+-------------------------------------+ ...# Check loaded kernel modules:
lsmodgrep nvidia Example Output:nvidia_uvm 962570 0 nvidia_drm 57344 1 nvidia_modeset 119808 2 nvidia_drm nvidia 3440640 15 nvidia_uvm,nvidia_modeset i2c_algo_bit 16384 1 nvidia_drm video 53248 1 nvidia_modeset# Check Xorg log for errors:
grep EE /var/log/Xorg.0.logExpected Output: Should be empty or contain no critical errors related to NVIDIA.Why this matters: `nvidia-smi` is the primary tool to confirm the driver is active and communicating with the GPU. `lsmod` shows kernel modules, and checking Xorg logs helps diagnose graphical display issues.
Troubleshooting
- Black Screen After Reboot:
- * This often indicates a driver conflict or incorrect installation.
- * Try booting into recovery mode and uninstalling the NVIDIA driver.
- * If you used the runfile installer, run it again with the `--uninstall` flag.
- * If you used package managers, use `sudo apt autoremove nvidia-*` (Ubuntu/Debian) or `sudo yum remove akmod-nvidia` (CentOS).
- * Ensure Nouveau is properly blacklisted.
- `nvidia-smi` command not found or "Failed to initialize NVML":
- * The driver is likely not loaded correctly.
- * Double-check that you rebooted after installation.
- * Verify that the `nvidia` kernel module is loaded (`lsmod
grep nvidia`). - * Ensure you installed the correct driver version for your GPU and kernel.
- * On CentOS, ensure `akmod-nvidia` is installed and has built successfully.
- CUDA Toolkit Issues:
- * Ensure you have installed the CUDA Toolkit, which is separate from the driver. Refer to the CUDA Installation Guide.
- * The driver version must be compatible with the CUDA Toolkit version. Check NVIDIA's CUDA documentation for compatibility matrices.
- Kernel Updates Break Driver:
- * If you used the runfile installer without DKMS, you'll need to reinstall the driver after a kernel update.
- * If you used distribution packages with DKMS or `akmod-nvidia`, the module should rebuild automatically. If not, manually trigger a rebuild or reinstall.
Related Articles
- CUDA Installation Guide
- Docker for AI/ML
- GPU Server Management
Category:AI and GPU Category:System Administration Category:Linux