Setting Up NVIDIA CUDA on Linux

= Setting Up NVIDIA CUDA on Linux =

This guide provides a comprehensive walkthrough for installing and configuring the NVIDIA CUDA Toolkit on a Linux system. CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA. It enables software developers and engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach known as GPGPU (General-Purpose computing on Graphics Processing Units). This is particularly useful for computationally intensive tasks such as machine learning, deep learning, scientific simulations, and video processing.

Prerequisites

Before you begin, ensure your system meets the following requirements:

NVIDIA GPU: A CUDA-enabled NVIDIA graphics card is essential. You can check compatibility on the NVIDIA CUDA GPUs list.
Linux Distribution: A supported Linux distribution (e.g., Ubuntu, CentOS, Debian, Fedora, RHEL). This guide primarily uses examples for Ubuntu/Debian-based systems, but commands can be adapted for others.
Root or sudo privileges: You will need administrative access to install software and modify system configurations.
Internet connection: To download the CUDA Toolkit and drivers.
Basic Linux command-line familiarity: Understanding of package management, file system navigation, and text editing.

For readily available GPU-accelerated computing, consider exploring options at Immers Cloud, offering GPU servers starting from $0.23/hr for inference.

Step 1: Identify Your NVIDIA GPU

First, confirm that your NVIDIA GPU is recognized by the system.

lspci grep -i nvidia

This command should output information about your NVIDIA graphics card.

Step 2: Install NVIDIA Drivers

The CUDA Toolkit requires compatible NVIDIA drivers. It's generally recommended to install the drivers *before* the CUDA Toolkit.

Option A: Using the Distribution's Package Manager (Recommended for ease of use)

For Ubuntu/Debian:

sudo apt update
sudo apt install nvidia-driver-XXX

ubuntu-drivers devices

sudo ubuntu-drivers autoinstall

For CentOS/RHEL:

sudo yum update
sudo yum install epel-release
sudo yum install xorg-x11-drv-nvidia-XXX

After installation, reboot your system:

sudo reboot

Option B: Downloading Drivers from NVIDIA (More control, but more complex)

1. Visit the NVIDIA Driver Downloads page. 2. Select your GPU model, operating system, and download type (e.g., "Production Branch"). 3. Download the `.run` file. 4. Before running the installer, you might need to stop your display manager. For example, on Ubuntu:

sudo systemctl stop display-manager

sudo sh NVIDIA-Linux-x86_64-XXX.XX.run

sudo reboot

Step 3: Verify Driver Installation

After rebooting, check if the NVIDIA driver is loaded correctly.

nvidia-smi

This command should display information about your GPU(s), including the driver version and CUDA version supported by the driver.

If `nvidia-smi` fails, it indicates a problem with the driver installation. Consult the troubleshooting section.

Step 4: Install the CUDA Toolkit

NVIDIA provides several methods for installing the CUDA Toolkit. The most common are using a package manager (deb/rpm) or a runfile installer.

Option A: Using the CUDA Repository (Recommended)

This method is generally preferred as it integrates well with your system's package manager and simplifies updates.

1. Add the CUDA Repository: Visit the NVIDIA CUDA Downloads page. Select your Operating System, Architecture, Distribution, Version, and Installer Type (e.g., `deb (local)` or `rpm (network)`). The page will provide the exact commands.

For Ubuntu (example using `deb (local)`):

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo apt-get update

For CentOS/RHEL (example using `rpm (network)`):

sudo rpm --import https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/7fa2af80.pub
    sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
    sudo dnf clean all

2. Install the CUDA Toolkit: For Ubuntu:

sudo apt-get -y install cuda

For CentOS/RHEL:

sudo dnf -y install cuda

3. Set Environment Variables: Add CUDA to your PATH and LD_LIBRARY_PATH. This is crucial for the system to find CUDA executables and libraries. Add the following lines to your `~/.bashrc` or `~/.zshrc` file:

export PATH=/usr/local/cuda/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

source ~/.bashrc

Option B: Using the Runfile Installer

1. Download the CUDA Toolkit runfile (`.run` extension) from the NVIDIA CUDA Downloads page, selecting the appropriate OS and version. 2. Make the runfile executable:

chmod +x cuda_XXX.XX_linux.run

sudo sh cuda_XXX.XX_linux.run

Step 5: Verify CUDA Toolkit Installation

After installing the toolkit and setting environment variables, verify that CUDA is accessible.

1. Check nvcc version: The `nvcc` (NVIDIA CUDA Compiler) is the compiler for CUDA.

nvcc --version

2. Compile and run CUDA Samples: CUDA Toolkit typically includes sample applications. These are usually located in `/usr/local/cuda/samples`. First, navigate to the samples directory.

cd /usr/local/cuda/samples

sudo make && cd bin/x86_64/linux/release/
    ./deviceQuery

You can also compile and run the `bandwidthTest` sample:

cd ../../../
    make bandwidthTest && cd bin/x86_64/linux/release/
    ./bandwidthTest

Troubleshooting

`nvidia-smi` command not found or fails:

lsmod grep nvidia

`nvcc: command not found` or CUDA samples don't compile:
Driver/Toolkit Version Mismatch:
Secure Boot Issues:

NVIDIA Driver Installation
Introduction to Machine Learning
GPU Computing

For powerful GPU servers that simplify deployment, explore options at Immers Cloud.

Category:AI and GPU Category:System Administration Category:NVIDIA