AI in Swindon
- AI in Swindon: Server Configuration
This article details the server configuration powering the "AI in Swindon" project. This project utilizes a cluster of servers located in our Swindon data center to run various machine learning models, primarily focused on local traffic analysis and predictive maintenance of railway infrastructure. This guide is intended for new system administrators and developers joining the project to provide a comprehensive overview of the infrastructure.
Overview
The "AI in Swindon" project relies on a distributed system architecture. We utilize a combination of bare-metal servers and virtual machines (VMs) orchestrated using Kubernetes. The core processing is handled by GPU-equipped servers, while supporting services like data storage, monitoring, and web interfaces run on dedicated VMs. The network is a key component, utilizing a high-bandwidth, low-latency Ethernet fabric. This allows for efficient communication between servers and minimizes data transfer bottlenecks. We've chosen a hybrid cloud approach, leveraging on-premise resources for sensitive data and cloud services for scalability during peak demand. The security infrastructure is paramount, with multiple layers of protection including firewalls, intrusion detection systems, and regular security audits. A key component of the system is the data pipeline, which ensures a consistent flow of information from sensors to the AI models.
Hardware Specifications
The project utilizes three primary server types: GPU Servers, Storage Servers, and Control Plane Servers.
GPU Servers
These servers are responsible for the computationally intensive tasks of training and inference of machine learning models.
Specification | Value | |
---|---|---|
CPU | Dual Intel Xeon Gold 6338 (32 cores/64 threads per CPU) | |
RAM | 512 GB DDR4 ECC REG 3200MHz | |
GPU | 4x NVIDIA A100 80GB | |
Storage | 1x 1.92TB NVMe SSD (OS & Applications) | 4x 18TB SAS HDD (Data Storage) |
Network | Dual 100GbE NICs (Mellanox ConnectX-6) | |
Power Supply | 2x 2000W Redundant PSU |
These servers are configured with Red Hat Enterprise Linux and utilize the NVIDIA Driver Stack for optimal GPU performance. We utilize Docker containers to isolate workloads and ensure reproducibility.
Storage Servers
These servers provide persistent storage for the project's datasets and model artifacts.
Specification | Value |
---|---|
CPU | Intel Xeon Silver 4310 (12 cores/24 threads) |
RAM | 256 GB DDR4 ECC REG 3200MHz |
Storage | 12x 18TB SAS HDD (RAID 6) |
Network | Dual 25GbE NICs |
Filesystem | ZFS |
Power Supply | 2x 1200W Redundant PSU |
Data is backed up regularly to an offsite location using rsync. The ZFS filesystem offers built-in data integrity features, protecting against data corruption. Access control is managed through LDAP integration.
Control Plane Servers
These servers host the Kubernetes control plane and other essential system services.
Specification | Value |
---|---|
CPU | Intel Xeon Gold 6248R (24 cores/48 threads) |
RAM | 128 GB DDR4 ECC REG 3200MHz |
Storage | 2x 960GB NVMe SSD (RAID 1) |
Network | Dual 10GbE NICs |
Operating System | Ubuntu Server 22.04 LTS |
Power Supply | 2x 850W Redundant PSU |
These servers are monitored closely using Prometheus and Grafana. We use Ansible for automated configuration management.
Software Stack
The software stack is built around a core of open-source technologies.
- Operating System: Red Hat Enterprise Linux and Ubuntu Server
- Containerization: Docker
- Orchestration: Kubernetes
- Machine Learning Frameworks: TensorFlow, PyTorch, Scikit-learn
- Data Storage: ZFS, PostgreSQL
- Monitoring: Prometheus, Grafana, ELK Stack
- Networking: Calico (CNI for Kubernetes)
- Configuration Management: Ansible
- Version Control: Git
Network Configuration
The network is segmented into three zones: a public zone for external access, a private zone for internal communication, and a management zone for administrative access. Firewalls are used to restrict traffic between zones. The inter-server communication within the private zone is handled by a high-speed InfiniBand network. DNS resolution is managed by internal servers for increased reliability.
Security Considerations
Security is a top priority. All servers are hardened according to industry best practices. Regular security audits are conducted to identify and address vulnerabilities. Two-factor authentication is required for all administrative access. Data encryption is used both in transit and at rest. We adhere to all relevant data privacy regulations.
Future Expansion
We anticipate expanding the cluster to accommodate growing data volumes and increasing computational demands. We are evaluating the use of NVMe over Fabrics to further improve storage performance. We are also exploring the integration of federated learning techniques to enable collaborative model training without sharing sensitive data.
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️