GPU Drivers

GPU Drivers

This article details the configuration and management of GPU drivers on our MediaWiki servers. Proper GPU driver configuration is crucial for tasks such as image thumbnailing, video transcoding (if enabled), and potentially future machine learning integrations. This guide is aimed at new server engineers and assumes a basic understanding of Linux server administration.

Understanding the Role of GPU Drivers

GPU drivers enable the operating system and applications like MediaWiki to communicate with the Graphics Processing Unit (GPU). In our environment, GPUs are primarily utilized to accelerate image processing tasks, reducing the load on the CPU and improving overall server responsiveness. Incorrect or outdated drivers can lead to instability, performance degradation, or even system crashes. We primarily utilize NVIDIA GPUs, so this guide will focus on NVIDIA drivers. However, the principles apply to other GPU vendors as well. See Server Hardware Overview for a complete list of server hardware.

Driver Selection and Installation

Choosing the correct driver version is critical. We generally follow a policy of using stable, tested drivers rather than the bleeding-edge releases. The specific driver version will depend on the GPU model and the kernel version in use. We primarily use the NVIDIA proprietary drivers, installed via the package manager. It's essential to avoid using Nouveau, the open-source NVIDIA driver, as it lacks the performance and features required for our workload. See Kernel Versions for details on supported kernels.

Identifying Your GPU

Before installing the driver, identify the installed GPU model. Use the following command:

```bash lspci | grep -i nvidia ```

This command will output information about the NVIDIA GPU(s) present in the system. Note the model number for driver selection. For further information on system identification, refer to Server Diagnostics.

Installation Procedure

The installation process varies slightly depending on the Linux distribution. The following example demonstrates the process on Debian/Ubuntu based systems:

1. Add the NVIDIA repository to your system’s sources list. 2. Update the package list: `sudo apt update` 3. Install the recommended driver package. For example, to install driver version 535.104.05: `sudo apt install nvidia-driver-535` 4. Reboot the server: `sudo reboot`

After rebooting, verify the driver installation using `nvidia-smi`. See Troubleshooting GPU Issues for common installation problems.

Supported NVIDIA GPU Models and Recommended Drivers

The following table lists the currently supported GPU models and their recommended drivers. This table is subject to change, so always verify the latest recommendations on the Server Configuration Documentation.

GPU Model	Recommended Driver Version	Notes
NVIDIA Tesla T4	535.104.05	Commonly used for thumbnail generation.
NVIDIA Quadro RTX 5000	525.147.05	Used in development and testing environments.
NVIDIA GeForce RTX 3090	535.104.05	Used in Video Transcoding (if enabled). Requires specific configuration.
NVIDIA Tesla V100	470.82.00	Older model, still supported but nearing end-of-life.

Driver Configuration and Monitoring

After installation, it’s important to configure the driver for optimal performance and monitor its status.

Configuration Options

The NVIDIA driver provides a range of configuration options accessible through the `nvidia-settings` utility. Key settings include:

**PowerMizer:** Controls the GPU's power consumption and performance.
**Thermal Management:** Configures the GPU's thermal throttling behavior.
**OpenGL Settings:** Adjusts OpenGL performance parameters.

These settings can be configured manually or through automated configuration scripts. See Automated Server Configuration for details.

Monitoring GPU Usage

Monitor GPU usage using the `nvidia-smi` command. This command provides real-time information about GPU utilization, temperature, memory usage, and power consumption. We also integrate GPU monitoring into our central monitoring system, Server Monitoring System.

The following table shows typical performance metrics:

Metric	Description	Typical Range
GPU Utilization (%)	Percentage of GPU processing capacity being used.	10-80% (depending on load)
Memory Usage (MB)	Amount of GPU memory currently being used.	500-10000 MB (depending on load and GPU model)
Temperature (°C)	GPU core temperature.	40-85°C
Power Usage (W)	GPU power consumption.	75-300W (depending on GPU model and load)

Driver Updates and Maintenance

Regular driver updates are essential to maintain stability, performance, and security. We follow a scheduled update process, typically updating drivers during scheduled maintenance windows. See Scheduled Maintenance Procedures for details.

Update Procedure

The update procedure is similar to the installation procedure. Remove the old driver, add the new repository, update the package list, install the new driver, and reboot. Always test the new driver in a staging environment before deploying it to production servers.

Rollback Procedure

If a driver update causes issues, it’s important to be able to roll back to a previous version. The rollback procedure depends on the Linux distribution and the package manager used. Typically, this involves removing the new driver and installing the previous version from the package manager’s cache. See Disaster Recovery Procedures for a full rollback strategy.

The following table summarizes key commands:

Action	Command	Description
Check Driver Version	`nvidia-smi`	Displays the currently installed driver version.
Update Package List	`sudo apt update` or `sudo yum update`	Updates the package list from the configured repositories.
Install Driver	`sudo apt install nvidia-driver-<version>` or `sudo yum install nvidia-driver-<version>`	Installs a specific driver version.
Remove Driver	`sudo apt remove nvidia-driver-<version>` or `sudo yum remove nvidia-driver-<version>`	Removes a specific driver version.

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️