Server rental store

AI in Swindon

# AI in Swindon: Server Configuration

This article details the server configuration powering the "AI in Swindon" project. This project utilizes a cluster of servers located in our Swindon data center to run various machine learning models, primarily focused on local traffic analysis and predictive maintenance of railway infrastructure. This guide is intended for new system administrators and developers joining the project to provide a comprehensive overview of the infrastructure.

Overview

The "AI in Swindon" project relies on a distributed system architecture. We utilize a combination of bare-metal servers and virtual machines (VMs) orchestrated using Kubernetes. The core processing is handled by GPU-equipped servers, while supporting services like data storage, monitoring, and web interfaces run on dedicated VMs. The network is a key component, utilizing a high-bandwidth, low-latency Ethernet fabric. This allows for efficient communication between servers and minimizes data transfer bottlenecks. We've chosen a hybrid cloud approach, leveraging on-premise resources for sensitive data and cloud services for scalability during peak demand. The security infrastructure is paramount, with multiple layers of protection including firewalls, intrusion detection systems, and regular security audits. A key component of the system is the data pipeline, which ensures a consistent flow of information from sensors to the AI models.

Hardware Specifications

The project utilizes three primary server types: GPU Servers, Storage Servers, and Control Plane Servers.

GPU Servers

These servers are responsible for the computationally intensive tasks of training and inference of machine learning models.

Specification Value
CPU Dual Intel Xeon Gold 6338 (32 cores/64 threads per CPU)
RAM 512 GB DDR4 ECC REG 3200MHz
GPU 4x NVIDIA A100 80GB
Storage 1x 1.92TB NVMe SSD (OS & Applications) 4x 18TB SAS HDD (Data Storage)
Network Dual 100GbE NICs (Mellanox ConnectX-6)
Power Supply 2x 2000W Redundant PSU

These servers are configured with Red Hat Enterprise Linux and utilize the NVIDIA Driver Stack for optimal GPU performance. We utilize Docker containers to isolate workloads and ensure reproducibility.

Storage Servers

These servers provide persistent storage for the project's datasets and model artifacts.

Specification Value
CPU Intel Xeon Silver 4310 (12 cores/24 threads)
RAM 256 GB DDR4 ECC REG 3200MHz
Storage 12x 18TB SAS HDD (RAID 6)
Network Dual 25GbE NICs
Filesystem ZFS
Power Supply 2x 1200W Redundant PSU

Data is backed up regularly to an offsite location using rsync. The ZFS filesystem offers built-in data integrity features, protecting against data corruption. Access control is managed through LDAP integration.

Control Plane Servers

These servers host the Kubernetes control plane and other essential system services.

Specification Value
CPU Intel Xeon Gold 6248R (24 cores/48 threads)
RAM 128 GB DDR4 ECC REG 3200MHz
Storage 2x 960GB NVMe SSD (RAID 1)
Network Dual 10GbE NICs
Operating System Ubuntu Server 22.04 LTS
Power Supply 2x 850W Redundant PSU

These servers are monitored closely using Prometheus and Grafana. We use Ansible for automated configuration management.

Software Stack

The software stack is built around a core of open-source technologies.

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️