AI in Milton Keynes
- AI in Milton Keynes: Server Configuration
This article details the server configuration powering the “AI in Milton Keynes” project, providing a technical overview for those contributing to or maintaining the infrastructure. This project focuses on deploying and testing various artificial intelligence models for applications within the Milton Keynes area, including traffic management, resource allocation, and predictive maintenance. This guide assumes a basic understanding of Linux server administration and network concepts. See Help:Contents for general wiki assistance and Manual:Configuration for MediaWiki configuration details.
Overview
The AI in Milton Keynes project utilizes a distributed server architecture, primarily hosted within a dedicated data center in Bletchley, Milton Keynes. The core infrastructure comprises a cluster of servers dedicated to model training, inference, and data storage. We leverage a hybrid cloud approach, utilizing both on-premise hardware and cloud resources from Amazon Web Services for scalability and redundancy. This allows us to handle fluctuating workloads and ensures high availability. Familiarize yourself with the Networking guidelines before making any network changes.
Hardware Specifications
The on-premise servers are built around a consistent hardware profile to simplify maintenance and deployment. The following table outlines the specifications for the primary training servers:
Component | Specification |
---|---|
CPU | Dual Intel Xeon Gold 6248R (24 cores per CPU) |
RAM | 512GB DDR4 ECC Registered RAM |
Storage (OS) | 1TB NVMe SSD |
Storage (Data) | 16TB RAID 6 SAS HDD Array |
GPU | 4 x NVIDIA A100 (80GB VRAM) |
Network Interface | Dual 100GbE QSFP28 |
The inference servers utilize a slightly different configuration, prioritizing cost-effectiveness while maintaining sufficient performance. See Hardware Maintenance for detailed procedures.
Component | Specification |
---|---|
CPU | Intel Xeon Silver 4210 (10 cores) |
RAM | 128GB DDR4 ECC Registered RAM |
Storage (OS) | 512GB NVMe SSD |
Storage (Data) | 8TB RAID 1 SAS HDD Array |
GPU | 2 x NVIDIA RTX 3090 (24GB VRAM) |
Network Interface | Dual 10GbE SFP+ |
Finally, the data storage servers are optimized for capacity and reliability.
Component | Specification |
---|---|
CPU | Intel Xeon E-2224 (6 cores) |
RAM | 64GB DDR4 ECC Registered RAM |
Storage | 64TB RAID 6 SAS HDD Array |
Network Interface | Quad 10GbE SFP+ |
Software Stack
The servers run a customized distribution of Ubuntu Server 22.04 LTS. The core software stack includes:
- Operating System: Ubuntu Server 22.04 LTS
- Containerization: Docker and Kubernetes for application deployment and orchestration.
- Programming Languages: Python 3.9, R 4.2.
- Machine Learning Frameworks: TensorFlow 2.10, PyTorch 1.12.
- Data Storage: PostgreSQL 14 for relational data, MinIO for object storage.
- Monitoring: Prometheus and Grafana for system monitoring and alerting.
- Version Control: Git with GitHub for code management.
See Software Installation Guide for detailed installation instructions.
Networking Configuration
The server cluster is connected via a dedicated VLAN. The following IP address ranges are used:
- Training Servers: 192.168.10.100 - 192.168.10.110
- Inference Servers: 192.168.10.120 - 192.168.10.130
- Data Storage Servers: 192.168.10.140 - 192.168.10.150
A dedicated firewall (pfSense) protects the cluster from external threats. All traffic is inspected and filtered according to predefined rules. Regular security audits are performed. Refer to the Security Policy document for more information. Access to the servers is restricted to authorized personnel via SSH using key-based authentication.
Data Backup and Recovery
Regular data backups are crucial for disaster recovery. We employ a multi-tiered backup strategy:
- **Full Backups:** Weekly full backups of all data storage servers.
- **Incremental Backups:** Daily incremental backups of all data storage servers.
- **Offsite Replication:** Replication of backups to a geographically diverse location (AWS S3).
The backup process is automated using Rsync and Duplicati. Regular restoration tests are performed to ensure the integrity of the backups. See Data Recovery Procedures for detailed instructions.
Future Considerations
We plan to upgrade the GPU infrastructure in the training servers to NVIDIA H100 GPUs in the next quarter. We are also exploring the use of Apache Kafka for real-time data streaming. Furthermore, investigating the integration of Federated Learning techniques to enhance data privacy and security is paramount.
Help:Contents
Manual:Configuration
Amazon Web Services
Networking
Hardware Maintenance
Software Installation Guide
Ubuntu Server
Docker
Kubernetes
Python
R
TensorFlow
PyTorch
PostgreSQL
MinIO
Prometheus
Grafana
Git
GitHub
pfSense
SSH
Security Policy
Data Recovery Procedures
Rsync
Duplicati
Apache Kafka
Federated Learning
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️