How to Set Up a GPU Server for AI Training

How to set up a GPU server for AI training — this hands-on guide walks you through configuring a GPU server for deep learning with CUDA, PyTorch, and TensorFlow. For hardware selection guidance, see GPU Servers for Machine Learning and AI.

Prerequisites

A GPU server with NVIDIA GPU (H100, A100, RTX 4090, or similar)
Ubuntu 22.04 or 24.04 LTS (recommended for best driver support)
Root or sudo access

Immers Cloud offers GPU servers with pre-installed NVIDIA drivers and CUDA, which can save significant setup time.

Step 1: Install NVIDIA Drivers

Check your GPU:

lspci | grep -i nvidia

Install the latest drivers:

sudo apt update
sudo apt install -y nvidia-driver-550
sudo reboot

Verify after reboot:

nvidia-smi

You should see your GPU model, driver version, and CUDA version.

Step 2: Install CUDA Toolkit

Download and install CUDA 12.x:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install -y cuda-toolkit-12-4

Add to your PATH:

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

Verify:

nvcc --version

Step 3: Install cuDNN

cuDNN accelerates neural network operations:

sudo apt install -y libcudnn8 libcudnn8-dev

Step 4: Set Up Python Environment

Use Miniconda for isolated environments:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
~/miniconda3/bin/conda init bash
source ~/.bashrc

Create a dedicated environment:

conda create -n ml python=3.11 -y
conda activate ml

Step 5: Install PyTorch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Verify GPU access:

python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"

Step 6: Install TensorFlow

pip install tensorflow[and-cuda]

Verify:

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Step 7: Optimize for Training

Enable Mixed Precision

Mixed precision (FP16/BF16) doubles training speed with minimal accuracy loss:

PyTorch:

from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
with autocast():
    output = model(input)
    loss = criterion(output, target)

TensorFlow:

from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16')

Monitor GPU Usage

watch -n 1 nvidia-smi

Or install nvitop for a better interface:

pip install nvitop
nvitop

Step 8: Multi-GPU Training

For servers with multiple GPUs:

PyTorch Distributed Data Parallel:

torchrun --nproc_per_node=4 train.py

TensorFlow MirroredStrategy:

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    model = create_model()

Common Troubleshooting

Issue	Solution
CUDA out of memory	Reduce batch size, enable gradient checkpointing
Driver/CUDA version mismatch	Check compatibility matrix on NVIDIA website
Slow training speed	Enable mixed precision, check data loading bottleneck
GPU not detected	Verify driver with nvidia-smi, check PCIe seating