GPU Drivers
- GPU Drivers
This article details the configuration and management of GPU drivers on our MediaWiki servers. Proper GPU driver configuration is crucial for tasks such as image thumbnailing, video transcoding (if enabled), and potentially future machine learning integrations. This guide is aimed at new server engineers and assumes a basic understanding of Linux server administration.
Understanding the Role of GPU Drivers
GPU drivers enable the operating system and applications like MediaWiki to communicate with the Graphics Processing Unit (GPU). In our environment, GPUs are primarily utilized to accelerate image processing tasks, reducing the load on the CPU and improving overall server responsiveness. Incorrect or outdated drivers can lead to instability, performance degradation, or even system crashes. We primarily utilize NVIDIA GPUs, so this guide will focus on NVIDIA drivers. However, the principles apply to other GPU vendors as well. See Server Hardware Overview for a complete list of server hardware.
Driver Selection and Installation
Choosing the correct driver version is critical. We generally follow a policy of using stable, tested drivers rather than the bleeding-edge releases. The specific driver version will depend on the GPU model and the kernel version in use. We primarily use the NVIDIA proprietary drivers, installed via the package manager. It's essential to avoid using Nouveau, the open-source NVIDIA driver, as it lacks the performance and features required for our workload. See Kernel Versions for details on supported kernels.
Identifying Your GPU
Before installing the driver, identify the installed GPU model. Use the following command:
```bash lspci | grep -i nvidia ```
This command will output information about the NVIDIA GPU(s) present in the system. Note the model number for driver selection. For further information on system identification, refer to Server Diagnostics.
Installation Procedure
The installation process varies slightly depending on the Linux distribution. The following example demonstrates the process on Debian/Ubuntu based systems:
1. Add the NVIDIA repository to your system’s sources list. 2. Update the package list: `sudo apt update` 3. Install the recommended driver package. For example, to install driver version 535.104.05: `sudo apt install nvidia-driver-535` 4. Reboot the server: `sudo reboot`
After rebooting, verify the driver installation using `nvidia-smi`. See Troubleshooting GPU Issues for common installation problems.
Supported NVIDIA GPU Models and Recommended Drivers
The following table lists the currently supported GPU models and their recommended drivers. This table is subject to change, so always verify the latest recommendations on the Server Configuration Documentation.
GPU Model | Recommended Driver Version | Notes |
---|---|---|
NVIDIA Tesla T4 | 535.104.05 | Commonly used for thumbnail generation. |
NVIDIA Quadro RTX 5000 | 525.147.05 | Used in development and testing environments. |
NVIDIA GeForce RTX 3090 | 535.104.05 | Used in Video Transcoding (if enabled). Requires specific configuration. |
NVIDIA Tesla V100 | 470.82.00 | Older model, still supported but nearing end-of-life. |
Driver Configuration and Monitoring
After installation, it’s important to configure the driver for optimal performance and monitor its status.
Configuration Options
The NVIDIA driver provides a range of configuration options accessible through the `nvidia-settings` utility. Key settings include:
- **PowerMizer:** Controls the GPU's power consumption and performance.
- **Thermal Management:** Configures the GPU's thermal throttling behavior.
- **OpenGL Settings:** Adjusts OpenGL performance parameters.
These settings can be configured manually or through automated configuration scripts. See Automated Server Configuration for details.
Monitoring GPU Usage
Monitor GPU usage using the `nvidia-smi` command. This command provides real-time information about GPU utilization, temperature, memory usage, and power consumption. We also integrate GPU monitoring into our central monitoring system, Server Monitoring System.
The following table shows typical performance metrics:
Metric | Description | Typical Range |
---|---|---|
GPU Utilization (%) | Percentage of GPU processing capacity being used. | 10-80% (depending on load) |
Memory Usage (MB) | Amount of GPU memory currently being used. | 500-10000 MB (depending on load and GPU model) |
Temperature (°C) | GPU core temperature. | 40-85°C |
Power Usage (W) | GPU power consumption. | 75-300W (depending on GPU model and load) |
Driver Updates and Maintenance
Regular driver updates are essential to maintain stability, performance, and security. We follow a scheduled update process, typically updating drivers during scheduled maintenance windows. See Scheduled Maintenance Procedures for details.
Update Procedure
The update procedure is similar to the installation procedure. Remove the old driver, add the new repository, update the package list, install the new driver, and reboot. Always test the new driver in a staging environment before deploying it to production servers.
Rollback Procedure
If a driver update causes issues, it’s important to be able to roll back to a previous version. The rollback procedure depends on the Linux distribution and the package manager used. Typically, this involves removing the new driver and installing the previous version from the package manager’s cache. See Disaster Recovery Procedures for a full rollback strategy.
The following table summarizes key commands:
Action | Command | Description |
---|---|---|
Check Driver Version | `nvidia-smi` | Displays the currently installed driver version. |
Update Package List | `sudo apt update` or `sudo yum update` | Updates the package list from the configured repositories. |
Install Driver | `sudo apt install nvidia-driver-<version>` or `sudo yum install nvidia-driver-<version>` | Installs a specific driver version. |
Remove Driver | `sudo apt remove nvidia-driver-<version>` or `sudo yum remove nvidia-driver-<version>` | Removes a specific driver version. |
See Also
- Server Hardware Overview
- Kernel Versions
- Server Diagnostics
- Troubleshooting GPU Issues
- Automated Server Configuration
- Server Monitoring System
- Scheduled Maintenance Procedures
- Disaster Recovery Procedures
- Image Thumbnailing Configuration
- Video Transcoding Configuration
- Performance Tuning Guide
- Security Hardening Guide
- Backup and Restore Procedures
- Log Analysis
- Capacity Planning
- Database Server Configuration
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️