CUDA Configuration

CUDA Configuration

This article details the configuration necessary to enable and optimize CUDA (Compute Unified Device Architecture) support on our servers. CUDA allows us to leverage the parallel processing power of NVIDIA GPUs for various tasks, including machine learning, video processing, and scientific simulations. This guide is intended for system administrators and developers new to CUDA deployment within our MediaWiki environment. Proper configuration is critical to ensure optimal performance and stability.

== Prerequisites

Before beginning, ensure the following prerequisites are met:

An NVIDIA GPU is installed and correctly recognized by the system. Verify this using `lspci | grep -i nvidia`.
The appropriate NVIDIA drivers are installed. These drivers must be compatible with both the GPU and the CUDA toolkit version. See NVIDIA Driver Installation for details.
Sufficient system memory and storage space are available. CUDA applications can be memory intensive.
You have root or sudo privileges to modify system configurations.
The server must be running a supported Linux distribution, such as Ubuntu Server, CentOS, or Debian.

== CUDA Toolkit Installation

The CUDA Toolkit provides the necessary libraries, header files, and tools for developing and running CUDA applications.

1. **Download the Toolkit:** Obtain the CUDA Toolkit from the NVIDIA Developer Website. Select the appropriate version for your operating system and GPU architecture. 2. **Installation:** Follow the installation instructions provided by NVIDIA. Typically, this involves running a shell script and accepting the license agreement. Ensure you select the correct installation path. A common path is `/usr/local/cuda-<version>`. 3. **Environment Variables:** After installation, you must configure the following environment variables in your `~/.bashrc` or `/etc/profile` file:

   *   `CUDA_HOME`:  Set to the CUDA Toolkit installation directory (e.g., `/usr/local/cuda-12.2`).
   *   `PATH`: Append `$CUDA_HOME/bin` to your `PATH`.
   *   `LD_LIBRARY_PATH`: Append `$CUDA_HOME/lib64` to your `LD_LIBRARY_PATH`.

   After modifying the file, source it using `source ~/.bashrc` or `source /etc/profile`.

== System Configuration

Several system configurations are crucial for optimal CUDA performance.

=== Kernel Module Loading

Ensure the NVIDIA kernel modules are loaded at boot time. Typically, this is handled automatically by the NVIDIA driver installation. However, verify this by running `lsmod | grep nvidia`. If the modules are not loaded, you may need to add them to the `/etc/modules` file.

=== NUMA Configuration

If your server has multiple NUMA (Non-Uniform Memory Access) nodes, it's important to configure CUDA to use the correct memory affinity. This can significantly improve performance. Use the `numactl` utility to manage NUMA affinity. See NUMA Best Practices for more information.

=== GPU Persistence Daemon

The NVIDIA GPU Persistence Daemon (`nvidia-persistenced`) can reduce latency and improve performance by keeping the GPU active even when not in use. This is especially useful for servers that run CUDA applications intermittently. The daemon is typically started automatically by systemd. Verify its status with `systemctl status nvidia-persistenced`. If it's not running, enable it with `systemctl enable nvidia-persistenced` and start it with `systemctl start nvidia-persistenced`.

== CUDA Device Properties

The following table summarizes the properties of a representative CUDA-enabled GPU:

Property	Value
GPU Model	NVIDIA GeForce RTX 3090
CUDA Cores	10496
Memory Size	24 GB
Memory Interface	384-bit
Max Power Consumption	350 W
Compute Capability	8.6

== CUDA Runtime API Version

The CUDA Runtime API version is crucial for compatibility with applications. You can verify the installed runtime version using `nvcc --version`.

Version	Description
11.0	Supports CUDA Compute Capability 7.5 and earlier.
12.0	Supports CUDA Compute Capability 8.6 and earlier.
12.2	Current stable release, supports latest GPU architectures.

== Monitoring CUDA Usage

Monitoring CUDA usage is essential for identifying performance bottlenecks and ensuring optimal resource allocation.

**`nvidia-smi`:** The NVIDIA System Management Interface (`nvidia-smi`) is a command-line utility that provides real-time information about GPU usage, including memory usage, temperature, and power consumption.
**`nvtop`:** A more user-friendly interactive monitor for NVIDIA GPUs.
**`gpustat`:** A command-line utility that provides a concise overview of GPU utilization.

The following table summarizes key metrics to monitor:

Metric	Description	Recommended Action
GPU Utilization	Percentage of time the GPU is actively processing tasks.	Investigate if consistently low, indicating potential bottlenecks elsewhere.
Memory Usage	Amount of GPU memory currently allocated.	Optimize application memory usage or consider GPUs with larger memory capacity.
Temperature	GPU temperature in degrees Celsius.	Ensure adequate cooling to prevent thermal throttling.
Power Usage	GPU power consumption in Watts.	Monitor for excessive power consumption.

== Troubleshooting

**CUDA applications fail to run:** Check the environment variables, driver installation, and CUDA Toolkit installation.
**Low performance:** Verify NUMA configuration, GPU utilization, and memory usage. Ensure the application is properly optimized for CUDA.
**Driver crashes:** Update to the latest stable driver version. Check system logs for error messages. Refer to Driver Troubleshooting.

== Further Reading

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️

CUDA Configuration

Contents

Intel-Based Server Configurations

AMD-Based Server Configurations

Order Your Dedicated Server

Need Assistance?

Navigation menu

Search