Running Stable Diffusion on a GPU Server

This guide provides a comprehensive walkthrough for setting up and running Stable Diffusion on a Linux-based GPU server. We will cover installation, VRAM requirements, and optimization techniques to ensure a smooth and efficient experience.

Prerequisites

Before you begin, ensure you have the following:

A Linux Server with a Compatible GPU: NVIDIA GPUs are highly recommended due to CUDA support. Ensure your GPU has sufficient VRAM (see VRAM Requirements section). You can find powerful GPU servers at Immers Cloud, with options starting from $0.23/hr for inference.
SSH Access: You'll need to connect to your server via SSH.
Basic Linux Command-Line Proficiency: Familiarity with commands like `cd`, `ls`, `sudo`, `apt`, and `git`.
NVIDIA Drivers Installed: Ensure the correct NVIDIA drivers are installed and functioning. You can check this by running:

nvidia-smi

If this command doesn't show your GPU information, you'll need to install the drivers first. Refer to your Linux distribution's documentation or NVIDIA's official website for installation instructions.

NVIDIA Container Toolkit (Recommended): For easier management and isolation, especially if you plan to run multiple AI models, installing the NVIDIA Container Toolkit is highly recommended. This allows Docker containers to access your GPU.
- Installation instructions can be found on the official NVIDIA Container Toolkit documentation.

Understanding VRAM Requirements

Stable Diffusion's VRAM (Video Random Access Memory) requirements depend largely on the model size, resolution of generated images, and batch size.

Minimum (1024x1024, small batch size): 6GB - 8GB VRAM is generally the minimum for basic image generation at standard resolutions. Performance might be slow.
Recommended (1024x1024, moderate batch size): 10GB - 12GB VRAM offers a good balance of performance and capability for most users.
High-End (Larger resolutions, larger batch sizes, advanced features): 16GB VRAM or more is ideal for generating higher resolution images, using larger models, or running multiple processes concurrently.

For demanding tasks, consider renting a dedicated GPU server from providers like Immers Cloud, which offers a range of GPUs from consumer-grade to enterprise-level H200s.

Installation Steps

This guide will focus on installing Stable Diffusion via AUTOMATIC1111's Stable Diffusion Web UI, a popular and feature-rich interface.

Step 1: Install Python and Git

Ensure you have Python 3.10.6 and Git installed. If not, you can install them using your distribution's package manager. For Debian/Ubuntu-based systems:

sudo apt update
sudo apt install python3 python3-venv git -y

Step 2: Clone the Stable Diffusion Web UI Repository

Create a directory for your Stable Diffusion installation and clone the repository:

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui

Step 3: Download Stable Diffusion Models

You need to download the Stable Diffusion model checkpoints (files ending in `.ckpt` or `.safetensors`). The most common ones are SD v1.5 and SDXL.

SD v1.5: You can download it from Hugging Face. A common source is the official Stable Diffusion v1-5 checkpoint.
SDXL: This is a more powerful model. Download the base and refiner models.

Place the downloaded model files into the `stable-diffusion-webui/models/Stable-diffusion` directory. If the directory doesn't exist, create it:

mkdir -p models/Stable-diffusion

You can typically find download links for these models on Hugging Face (e.g., search for "runwayml/stable-diffusion-v1-5" or "stabilityai/stable-diffusion-xl-base-1.0"). After downloading, move the `.ckpt` or `.safetensors` files into the `models/Stable-diffusion` folder.

Step 4: Configure the Web UI Script

The `webui-user.sh` script is used to launch the Stable Diffusion Web UI. You can modify it to set specific environment variables or command-line arguments.

Edit the `webui-user.sh` file:

nano webui-user.sh

Find the line that starts with `export COMMANDLINE_ARGS=` and uncomment it. You can add arguments here to control performance and features. Some useful arguments include:

`--xformers`: Enables xformers memory-efficient attention, which can significantly speed up generation and reduce VRAM usage. Highly recommended if supported by your GPU.
`--medvram`: Optimizes for systems with medium VRAM (e.g., 8-12GB).
`--lowvram`: Optimizes for systems with low VRAM (e.g., 4-6GB), but will be significantly slower.
`--no-half`: Disables automatic mixed precision. Might be necessary on some older GPUs or if you encounter errors, but will increase VRAM usage.
`--listen`: Allows access from other machines on your network. Be cautious with this on public networks.

A good starting configuration for a server with 10GB+ VRAM would be:

export COMMANDLINE_ARGS="--xformers --listen"

For servers with less VRAM (e.g., 8GB), you might try:

export COMMANDLINE_ARGS="--xformers --medvram --listen"

Save and exit the editor (Ctrl+X, Y, Enter in nano).

Step 5: Launch Stable Diffusion

Now, run the launch script:

bash webui-user.sh

The first time you run this, it will download and install all necessary Python dependencies. This can take a considerable amount of time. Subsequent launches will be much faster.

Once the dependencies are installed and the server is ready, you will see a message indicating that the web UI is running and providing a URL, typically:

Running on local URL: http://127.0.0.1:7860

If you used the `--listen` argument, you can access it from your local machine's browser by navigating to `<your_server_ip>:7860`. For example: `http://192.168.1.x:7860`.

Optimization Tips

Use `--xformers`: As mentioned, this is one of the most effective ways to improve speed and reduce VRAM usage.
Select Smaller Models: If you're struggling with VRAM, try using smaller or older Stable Diffusion checkpoints.
Reduce Resolution and Batch Size: Lowering the output image resolution and the number of images generated per batch (batch size) will significantly decrease VRAM demands.
Close Unnecessary Applications: Ensure no other GPU-intensive applications are running on the server.
Monitor VRAM Usage: Use `nvidia-smi` periodically to check your VRAM consumption.
Consider SDXL Turbo or LCM Models: For faster inference, explore specialized models like SDXL Turbo or Latent Consistency Models (LCMs), which are designed for rapid image generation.

Troubleshooting

`CUDA out of memory` Error:

   * Solution: Try using `--medvram` or `--lowvram` arguments in `webui-user.sh`. Reduce image resolution and batch size. Close other GPU-intensive applications. If using SDXL, try switching to SD v1.5.

Slow Generation Speed:

   * Solution: Ensure `--xformers` is enabled. Check if your GPU drivers are up-to-date. Consider upgrading your server's GPU if performance is consistently poor.

Web UI Not Accessible:

   * Solution: Verify that the `--listen` argument is present in `webui-user.sh` if you're trying to access it from another machine. Check your server's firewall to ensure port 7860 is open.

Dependencies Not Installing:

   * Solution: Ensure you have a stable internet connection. Check for any specific error messages during the installation process and search online for solutions related to those errors.

Running Stable Diffusion on GPU Server

Contents