AI in Swindon

# AI in Swindon: Server Configuration

This article details the server configuration powering the "AI in Swindon" project. This project utilizes a cluster of servers located in our Swindon data center to run various machine learning models, primarily focused on local traffic analysis and predictive maintenance of railway infrastructure. This guide is intended for new system administrators and developers joining the project to provide a comprehensive overview of the infrastructure.

Overview

The "AI in Swindon" project relies on a distributed system architecture. We utilize a combination of bare-metal servers and virtual machines (VMs) orchestrated using Kubernetes. The core processing is handled by GPU-equipped servers, while supporting services like data storage, monitoring, and web interfaces run on dedicated VMs. The network is a key component, utilizing a high-bandwidth, low-latency Ethernet fabric. This allows for efficient communication between servers and minimizes data transfer bottlenecks. We've chosen a hybrid cloud approach, leveraging on-premise resources for sensitive data and cloud services for scalability during peak demand. The security infrastructure is paramount, with multiple layers of protection including firewalls, intrusion detection systems, and regular security audits. A key component of the system is the data pipeline, which ensures a consistent flow of information from sensors to the AI models.

Hardware Specifications

The project utilizes three primary server types: GPU Servers, Storage Servers, and Control Plane Servers.

GPU Servers

These servers are responsible for the computationally intensive tasks of training and inference of machine learning models.

Specification	Value
CPU	Dual Intel Xeon Gold 6338 (32 cores/64 threads per CPU)
RAM	512 GB DDR4 ECC REG 3200MHz
GPU	4x NVIDIA A100 80GB
Storage	1x 1.92TB NVMe SSD (OS & Applications)	4x 18TB SAS HDD (Data Storage)
Network	Dual 100GbE NICs (Mellanox ConnectX-6)
Power Supply	2x 2000W Redundant PSU

These servers are configured with Red Hat Enterprise Linux and utilize the NVIDIA Driver Stack for optimal GPU performance. We utilize Docker containers to isolate workloads and ensure reproducibility.

Storage Servers

These servers provide persistent storage for the project's datasets and model artifacts.

Specification	Value
CPU	Intel Xeon Silver 4310 (12 cores/24 threads)
RAM	256 GB DDR4 ECC REG 3200MHz
Storage	12x 18TB SAS HDD (RAID 6)
Network	Dual 25GbE NICs
Filesystem	ZFS
Power Supply	2x 1200W Redundant PSU

Data is backed up regularly to an offsite location using rsync. The ZFS filesystem offers built-in data integrity features, protecting against data corruption. Access control is managed through LDAP integration.

Control Plane Servers

These servers host the Kubernetes control plane and other essential system services.

Specification	Value
CPU	Intel Xeon Gold 6248R (24 cores/48 threads)
RAM	128 GB DDR4 ECC REG 3200MHz
Storage	2x 960GB NVMe SSD (RAID 1)
Network	Dual 10GbE NICs
Operating System	Ubuntu Server 22.04 LTS
Power Supply	2x 850W Redundant PSU

These servers are monitored closely using Prometheus and Grafana. We use Ansible for automated configuration management.

Software Stack

The software stack is built around a core of open-source technologies.

Operating System: Red Hat Enterprise Linux and Ubuntu Server
Containerization: Docker
Orchestration: Kubernetes
Machine Learning Frameworks: TensorFlow, PyTorch, Scikit-learn
Data Storage: ZFS, PostgreSQL
Monitoring: Prometheus, Grafana, ELK Stack
Networking: Calico (CNI for Kubernetes)
Configuration Management: Ansible
Version Control: Git

Network Configuration

The network is segmented into three zones: a public zone for external access, a private zone for internal communication, and a management zone for administrative access. Firewalls are used to restrict traffic between zones. The inter-server communication within the private zone is handled by a high-speed InfiniBand network. DNS resolution is managed by internal servers for increased reliability.

Security Considerations

Security is a top priority. All servers are hardened according to industry best practices. Regular security audits are conducted to identify and address vulnerabilities. Two-factor authentication is required for all administrative access. Data encryption is used both in transit and at rest. We adhere to all relevant data privacy regulations.

Future Expansion

We anticipate expanding the cluster to accommodate growing data volumes and increasing computational demands. We are evaluating the use of NVMe over Fabrics to further improve storage performance. We are also exploring the integration of federated learning techniques to enable collaborative model training without sharing sensitive data.

Category:Server Hardware

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️