AI in London
- AI in London: Server Configuration Overview
This article details the server configuration supporting Artificial Intelligence (AI) workloads in our London data center. It is aimed at newcomers to the MediaWiki site and provides a technical overview of the hardware and software employed. Please refer to the Server Room Access Policy before accessing any of these systems.
Overview
The "AI in London" project utilizes a cluster of high-performance servers dedicated to machine learning, deep learning, and natural language processing tasks. This infrastructure is designed for scalability, redundancy, and efficient resource utilization. The primary goal is to provide a robust platform for our Data Science Team to develop and deploy AI models. We maintain detailed Server Inventory records.
Hardware Specifications
The core of the AI infrastructure consists of the following server configurations. The servers are housed in Rack 7, Bay 1-12. See the Data Center Map for precise location details.
Server Role | Model | CPU | RAM | Storage | GPU |
---|---|---|---|---|---|
Master Node (Data Processing) | Dell PowerEdge R750xa | 2 x Intel Xeon Gold 6348 (28 cores each) | 512GB DDR4 ECC REG | 4 x 4TB NVMe SSD (RAID 10) | NVIDIA RTX A6000 (48GB) |
Worker Node 1-4 (Training) | Dell PowerEdge R750xa | 2 x Intel Xeon Gold 6338 (32 cores each) | 256GB DDR4 ECC REG | 2 x 4TB NVMe SSD (RAID 1) | NVIDIA A100 (80GB) |
Worker Node 5-8 (Inference) | Supermicro SYS-2029U-TR4 | 2 x AMD EPYC 7763 (64 cores each) | 128GB DDR4 ECC REG | 2 x 2TB NVMe SSD (RAID 1) | NVIDIA Tesla T4 (16GB) |
Storage Node (Data Repository) | Dell PowerEdge R740xd | 2 x Intel Xeon Gold 6248R (24 cores each) | 128GB DDR4 ECC REG | 16 x 16TB SAS HDD (RAID 6) | None |
All servers are connected via a 100Gbps InfiniBand network. Please review the Network Topology Diagram. Power redundancy is provided by dual power supplies and UPS systems. Check the UPS Status Page for current status.
Software Stack
The servers run a customized version of Ubuntu Server 22.04 LTS. The core software components are detailed below. See the Software Licensing Information for compliance details.
Component | Version | Purpose |
---|---|---|
Operating System | Ubuntu Server 22.04 LTS | Base operating system |
CUDA Toolkit | 12.2 | GPU programming toolkit |
cuDNN | 8.9.2 | Deep neural network library |
TensorFlow | 2.13 | Machine learning framework |
PyTorch | 2.0.1 | Machine learning framework |
Docker | 24.0.5 | Containerization platform |
Kubernetes | 1.27 | Container orchestration |
All code is managed using Git and stored in our internal Code Repository. We utilize Jenkins for continuous integration and continuous deployment (CI/CD).
Networking Configuration
The AI cluster utilizes a dedicated VLAN (192.168.100.0/24) for internal communication. Access to the cluster from external networks is restricted to authorized personnel via a secure VPN connection. Refer to the VPN Configuration Guide for instructions.
Parameter | Value |
---|---|
VLAN ID | 100 |
Subnet Mask | 255.255.255.0 |
Gateway | 192.168.100.1 |
DNS Servers | 8.8.8.8, 8.8.4.4 |
Firewall Rules | See Firewall Configuration |
Monitoring & Alerting
The entire infrastructure is monitored using Prometheus and Grafana. Alerts are configured for critical metrics such as CPU utilization, memory usage, disk space, and GPU temperature. See the Monitoring Dashboard Link for a live view of the system status. The Incident Response Plan outlines procedures for handling alerts and outages.
Future Expansion
Plans are underway to expand the AI cluster with additional GPUs and storage capacity. We are evaluating the use of NVMe over Fabrics to further improve I/O performance. The Capacity Planning Document details the projected growth and resource requirements.
Server Maintenance Schedule Backup and Recovery Procedures Security Audit Reports
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️