Setting Up NVIDIA CUDA on Linux

From Server rental store
Revision as of 10:01, 13 April 2026 by Admin (talk | contribs) (New server guide)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Setting Up NVIDIA CUDA on Linux

This guide provides a comprehensive walkthrough for installing and configuring the NVIDIA CUDA Toolkit on a Linux system. CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA. It enables software developers and engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach known as GPGPU (General-Purpose computing on Graphics Processing Units). This is particularly useful for computationally intensive tasks such as machine learning, deep learning, scientific simulations, and video processing.

Prerequisites

Before you begin, ensure your system meets the following requirements:

  • NVIDIA GPU: A CUDA-enabled NVIDIA graphics card is essential. You can check compatibility on the [CUDA GPUs] list.
  • Linux Distribution: A supported Linux distribution (e.g., Ubuntu, CentOS, Debian, Fedora, RHEL). This guide primarily uses examples for Ubuntu/Debian-based systems, but commands can be adapted for others.
  • Root or sudo privileges: You will need administrative access to install software and modify system configurations.
  • Internet connection: To download the CUDA Toolkit and drivers.
  • Basic Linux command-line familiarity: Understanding of package management, file system navigation, and text editing.

For readily available GPU-accelerated computing, consider exploring options at Immers Cloud, offering GPU servers starting from $0.23/hr for inference.

Step 1: Identify Your NVIDIA GPU

First, confirm that your NVIDIA GPU is recognized by the system.

lspci | grep -i nvidia

This command should output information about your NVIDIA graphics card.

Step 2: Install NVIDIA Drivers

The CUDA Toolkit requires compatible NVIDIA drivers. It's generally recommended to install the drivers *before* the CUDA Toolkit.

Option A: Using the Distribution's Package Manager (Recommended for ease of use)

For Ubuntu/Debian:

sudo apt update
sudo apt install nvidia-driver-XXX

Replace `XXX` with the recommended driver version for your distribution or GPU. You can often find the recommended version by running:

ubuntu-drivers devices

Then install the recommended one:

sudo ubuntu-drivers autoinstall

For CentOS/RHEL:

sudo yum update
sudo yum install epel-release
sudo yum install xorg-x11-drv-nvidia-XXX

Replace `XXX` with the appropriate version.

After installation, reboot your system:

sudo reboot

Option B: Downloading Drivers from NVIDIA (More control, but more complex)

1. Visit the [Driver Downloads] page. 2. Select your GPU model, operating system, and download type (e.g., "Production Branch"). 3. Download the `.run` file. 4. Before running the installer, you might need to stop your display manager. For example, on Ubuntu:

sudo systemctl stop display-manager

5. Navigate to the directory where you downloaded the file and run it with root privileges:

sudo sh NVIDIA-Linux-x86_64-XXX.XX.run
   Follow the on-screen prompts.

6. Reboot your system:

sudo reboot

Step 3: Verify Driver Installation

After rebooting, check if the NVIDIA driver is loaded correctly.

nvidia-smi

This command should display information about your GPU(s), including the driver version and CUDA version supported by the driver.

If `nvidia-smi` fails, it indicates a problem with the driver installation. Consult the troubleshooting section.

Step 4: Install the CUDA Toolkit

NVIDIA provides several methods for installing the CUDA Toolkit. The most common are using a package manager (deb/rpm) or a runfile installer.

Option A: Using the CUDA Repository (Recommended)

This method is generally preferred as it integrates well with your system's package manager and simplifies updates.

1. Add the CUDA Repository:

   Visit the [CUDA Downloads] page. Select your Operating System, Architecture, Distribution, Version, and Installer Type (e.g., `deb (local)` or `rpm (network)`). The page will provide the exact commands.
   For Ubuntu (example using `deb (local)`):
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo apt-get update
   For CentOS/RHEL (example using `rpm (network)`):
sudo rpm --import https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/7fa2af80.pub
    sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
    sudo dnf clean all

2. Install the CUDA Toolkit:

   For Ubuntu:
sudo apt-get -y install cuda
   This installs the latest CUDA Toolkit and its dependencies. If you need a specific version, you might install `cuda-toolkit-XX-Y` (e.g., `cuda-toolkit-12-2`).
   For CentOS/RHEL:
sudo dnf -y install cuda

3. Set Environment Variables:

   Add CUDA to your PATH and LD_LIBRARY_PATH. This is crucial for the system to find CUDA executables and libraries.
   Add the following lines to your `~/.bashrc` or `~/.zshrc` file:
export PATH=/usr/local/cuda/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
   Then, apply the changes to your current session:
source ~/.bashrc
   (Or `source ~/.zshrc` if you use Zsh)

Option B: Using the Runfile Installer

1. Download the CUDA Toolkit runfile (`.run` extension) from the [CUDA Downloads] page, selecting the appropriate OS and version. 2. Make the runfile executable:

chmod +x cuda_XXX.XX_linux.run

3. Run the installer with root privileges. It's often recommended to *not* install the driver if you've already installed a compatible one.

sudo sh cuda_XXX.XX_linux.run
   Follow the on-screen prompts. When asked about installing the driver, choose "no" if you already have one installed and verified.

4. Set environment variables as described in Option A, Step 3.

Step 5: Verify CUDA Toolkit Installation

After installing the toolkit and setting environment variables, verify that CUDA is accessible.

1. Check nvcc version:

   The `nvcc` (NVIDIA CUDA Compiler) is the compiler for CUDA.
nvcc --version
   This should display the installed CUDA Toolkit version.

2. Compile and run CUDA Samples:

   CUDA Toolkit typically includes sample applications. These are usually located in `/usr/local/cuda/samples`.
   First, navigate to the samples directory.
cd /usr/local/cuda/samples
   Then, compile a sample, for example, the deviceQuery sample:
sudo make && cd bin/x86_64/linux/release/
    ./deviceQuery
   This utility will list your CUDA-enabled devices and report if they are CUDA-capable. You should see a "Result = PASS" at the end.
   You can also compile and run the `bandwidthTest` sample:
cd ../../../
    make bandwidthTest && cd bin/x86_64/linux/release/
    ./bandwidthTest
   This tests memory bandwidth between the host and device. It should also report "Result = PASS".

Troubleshooting

  • `nvidia-smi` command not found or fails:
   *   Ensure the NVIDIA driver is correctly installed and loaded. Rebooting often helps.
   *   Check if `/usr/bin/nvidia-smi` or a similar path exists.

* Verify that the `nvidia` kernel module is loaded:

lsmod | grep nvidia
  • `nvcc: command not found` or CUDA samples don't compile:
   *   Ensure the CUDA environment variables (`PATH` and `LD_LIBRARY_PATH`) are correctly set in your `~/.bashrc` or `~/.zshrc` and that you've sourced the file (`source ~/.bashrc`).
   *   Check if `/usr/local/cuda/bin/nvcc` exists.
   *   Verify that the CUDA Toolkit was installed correctly. Reinstall if necessary.
  • Driver/Toolkit Version Mismatch:
   *   The `nvidia-smi` output shows the maximum CUDA version supported by the driver. The CUDA Toolkit version you install should be less than or equal to this supported version. If you install a newer CUDA Toolkit than your driver supports, you might encounter issues. You may need to update your NVIDIA driver or install an older CUDA Toolkit.
  • Secure Boot Issues:
   *   If you have Secure Boot enabled on your system, you might need to sign the NVIDIA kernel modules. This is often handled during driver installation, but if not, you might need to manually sign them or disable Secure Boot.

Related Articles

For powerful GPU servers that simplify deployment, explore options at Immers Cloud.