Join our Telegram: @serverrental_wiki | BTC Analysis | Trading Signals | Telegraph
Setting Up NVIDIA CUDA on Linux
Setting Up NVIDIA CUDA on Linux
This guide provides a comprehensive walkthrough for installing and configuring the NVIDIA CUDA Toolkit on a Linux system. CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA. It enables software developers and engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach known as GPGPU (General-Purpose computing on Graphics Processing Units). This is particularly useful for computationally intensive tasks such as machine learning, deep learning, scientific simulations, and video processing.
Prerequisites
Before you begin, ensure your system meets the following requirements:
- NVIDIA GPU: A CUDA-enabled NVIDIA graphics card is essential. You can check compatibility on the [CUDA GPUs] list.
- Linux Distribution: A supported Linux distribution (e.g., Ubuntu, CentOS, Debian, Fedora, RHEL). This guide primarily uses examples for Ubuntu/Debian-based systems, but commands can be adapted for others.
- Root or sudo privileges: You will need administrative access to install software and modify system configurations.
- Internet connection: To download the CUDA Toolkit and drivers.
- Basic Linux command-line familiarity: Understanding of package management, file system navigation, and text editing.
For readily available GPU-accelerated computing, consider exploring options at Immers Cloud, offering GPU servers starting from $0.23/hr for inference.
Step 1: Identify Your NVIDIA GPU
First, confirm that your NVIDIA GPU is recognized by the system.
lspci | grep -i nvidia
This command should output information about your NVIDIA graphics card.
Step 2: Install NVIDIA Drivers
The CUDA Toolkit requires compatible NVIDIA drivers. It's generally recommended to install the drivers *before* the CUDA Toolkit.
Option A: Using the Distribution's Package Manager (Recommended for ease of use)
For Ubuntu/Debian:
sudo apt update sudo apt install nvidia-driver-XXX
Replace `XXX` with the recommended driver version for your distribution or GPU. You can often find the recommended version by running:
ubuntu-drivers devices
Then install the recommended one:
sudo ubuntu-drivers autoinstall
For CentOS/RHEL:
sudo yum update sudo yum install epel-release sudo yum install xorg-x11-drv-nvidia-XXX
Replace `XXX` with the appropriate version.
After installation, reboot your system:
sudo reboot
Option B: Downloading Drivers from NVIDIA (More control, but more complex)
1. Visit the [Driver Downloads] page. 2. Select your GPU model, operating system, and download type (e.g., "Production Branch"). 3. Download the `.run` file. 4. Before running the installer, you might need to stop your display manager. For example, on Ubuntu:
sudo systemctl stop display-manager
5. Navigate to the directory where you downloaded the file and run it with root privileges:
sudo sh NVIDIA-Linux-x86_64-XXX.XX.run
Follow the on-screen prompts.
6. Reboot your system:
sudo reboot
Step 3: Verify Driver Installation
After rebooting, check if the NVIDIA driver is loaded correctly.
nvidia-smi
This command should display information about your GPU(s), including the driver version and CUDA version supported by the driver.
If `nvidia-smi` fails, it indicates a problem with the driver installation. Consult the troubleshooting section.
Step 4: Install the CUDA Toolkit
NVIDIA provides several methods for installing the CUDA Toolkit. The most common are using a package manager (deb/rpm) or a runfile installer.
Option A: Using the CUDA Repository (Recommended)
This method is generally preferred as it integrates well with your system's package manager and simplifies updates.
1. Add the CUDA Repository:
Visit the [CUDA Downloads] page. Select your Operating System, Architecture, Distribution, Version, and Installer Type (e.g., `deb (local)` or `rpm (network)`). The page will provide the exact commands.
For Ubuntu (example using `deb (local)`):
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
For CentOS/RHEL (example using `rpm (network)`):
sudo rpm --import https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/7fa2af80.pub
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
sudo dnf clean all
2. Install the CUDA Toolkit:
For Ubuntu:
sudo apt-get -y install cuda
This installs the latest CUDA Toolkit and its dependencies. If you need a specific version, you might install `cuda-toolkit-XX-Y` (e.g., `cuda-toolkit-12-2`).
For CentOS/RHEL:
sudo dnf -y install cuda
3. Set Environment Variables:
Add CUDA to your PATH and LD_LIBRARY_PATH. This is crucial for the system to find CUDA executables and libraries. Add the following lines to your `~/.bashrc` or `~/.zshrc` file:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Then, apply the changes to your current session:
source ~/.bashrc
(Or `source ~/.zshrc` if you use Zsh)
Option B: Using the Runfile Installer
1. Download the CUDA Toolkit runfile (`.run` extension) from the [CUDA Downloads] page, selecting the appropriate OS and version. 2. Make the runfile executable:
chmod +x cuda_XXX.XX_linux.run
3. Run the installer with root privileges. It's often recommended to *not* install the driver if you've already installed a compatible one.
sudo sh cuda_XXX.XX_linux.run
Follow the on-screen prompts. When asked about installing the driver, choose "no" if you already have one installed and verified.
4. Set environment variables as described in Option A, Step 3.
Step 5: Verify CUDA Toolkit Installation
After installing the toolkit and setting environment variables, verify that CUDA is accessible.
1. Check nvcc version:
The `nvcc` (NVIDIA CUDA Compiler) is the compiler for CUDA.
nvcc --version
This should display the installed CUDA Toolkit version.
2. Compile and run CUDA Samples:
CUDA Toolkit typically includes sample applications. These are usually located in `/usr/local/cuda/samples`. First, navigate to the samples directory.
cd /usr/local/cuda/samples
Then, compile a sample, for example, the deviceQuery sample:
sudo make && cd bin/x86_64/linux/release/
./deviceQuery
This utility will list your CUDA-enabled devices and report if they are CUDA-capable. You should see a "Result = PASS" at the end.
You can also compile and run the `bandwidthTest` sample:
cd ../../../
make bandwidthTest && cd bin/x86_64/linux/release/
./bandwidthTest
This tests memory bandwidth between the host and device. It should also report "Result = PASS".
Troubleshooting
- `nvidia-smi` command not found or fails:
* Ensure the NVIDIA driver is correctly installed and loaded. Rebooting often helps. * Check if `/usr/bin/nvidia-smi` or a similar path exists.
* Verify that the `nvidia` kernel module is loaded:
lsmod | grep nvidia
- `nvcc: command not found` or CUDA samples don't compile:
* Ensure the CUDA environment variables (`PATH` and `LD_LIBRARY_PATH`) are correctly set in your `~/.bashrc` or `~/.zshrc` and that you've sourced the file (`source ~/.bashrc`). * Check if `/usr/local/cuda/bin/nvcc` exists. * Verify that the CUDA Toolkit was installed correctly. Reinstall if necessary.
- Driver/Toolkit Version Mismatch:
* The `nvidia-smi` output shows the maximum CUDA version supported by the driver. The CUDA Toolkit version you install should be less than or equal to this supported version. If you install a newer CUDA Toolkit than your driver supports, you might encounter issues. You may need to update your NVIDIA driver or install an older CUDA Toolkit.
- Secure Boot Issues:
* If you have Secure Boot enabled on your system, you might need to sign the NVIDIA kernel modules. This is often handled during driver installation, but if not, you might need to manually sign them or disable Secure Boot.
Related Articles
For powerful GPU servers that simplify deployment, explore options at Immers Cloud.