How to Set Up a GPU Server for AI Training

How to set up a GPU server for AI training — this hands-on guide walks you through configuring a GPU server for deep learning with CUDA, PyTorch, and TensorFlow. For hardware selection guidance, see GPU Servers for Machine Learning and AI.

Prerequisites

A GPU server with NVIDIA GPU (H100, A100, RTX 4090, or similar)
Ubuntu 22.04 or 24.04 LTS (recommended for best driver support)
Root or sudo access

Immers Cloud offers GPU servers with pre-installed NVIDIA drivers and CUDA, which can save significant setup time.

Step 1: Install NVIDIA Drivers

Check your GPU:

lspci grep -i nvidia

Install the latest drivers:

sudo apt update sudo apt install -y nvidia-driver-550 sudo reboot

Verify after reboot:

nvidia-smi

You should see your GPU model, driver version, and CUDA version.

Step 2: Install CUDA Toolkit

Download and install CUDA 12.x:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt update sudo apt install -y cuda-toolkit-12-4

Add to your PATH:

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc source ~/.bashrc

Verify:

nvcc --version

Step 3: Install cuDNN

cuDNN accelerates neural network operations:

sudo apt install -y libcudnn8 libcudnn8-dev

Step 4: Set Up Python Environment

Use Miniconda for isolated environments:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b ~/miniconda3/bin/conda init bash source ~/.bashrc

Create a dedicated environment:

conda create -n ml python=3.11 -y conda activate ml

Step 5: Install PyTorch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Verify GPU access:

python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"

Step 6: Install TensorFlow

pip install tensorflow[and-cuda]

Verify:

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Step 7: Optimize for Training

Enable Mixed Precision

Mixed precision (FP16/BF16) doubles training speed with minimal accuracy loss:

PyTorch:

from torch.cuda.amp import autocast, GradScaler scaler = GradScaler() with autocast(): output = model(input) loss = criterion(output, target)

TensorFlow:

from tensorflow.keras import mixed_precision mixed_precision.set_global_policy('mixed_float16')

Monitor GPU Usage

watch -n 1 nvidia-smi

Or install nvitop for a better interface:

pip install nvitop nvitop

Step 8: Multi-GPU Training

For servers with multiple GPUs:

PyTorch Distributed Data Parallel:

torchrun --nproc_per_node=4 train.py

TensorFlow MirroredStrategy:

strategy = tf.distribute.MirroredStrategy() with strategy.scope(): model = create_model()

Common Troubleshooting

Issue !! Solution
CUDA out of memory \|\| Reduce batch size, enable gradient checkpointing
Driver/CUDA version mismatch \|\| Check compatibility matrix on NVIDIA website
Slow training speed \|\| Enable mixed precision, check data loading bottleneck
GPU not detected \|\| Verify driver with nvidia-smi, check PCIe seating