AI in Leeds

AI in Leeds: Server Configuration

Welcome to the documentation for the "AI in Leeds" server cluster. This article details the hardware and software configuration powering our Artificial Intelligence initiatives within the Leeds data centre. This guide is aimed at newcomers to the wiki and those needing detailed information about the server infrastructure. Understanding this configuration is crucial for System Administrators and Developers working with AI models hosted on this cluster.

Overview

The "AI in Leeds" cluster is a dedicated environment designed to handle the computational demands of machine learning, deep learning, and natural language processing tasks. It comprises a network of high-performance servers interconnected via a low-latency network. The primary goal of this infrastructure is to provide a scalable and reliable platform for research and development in AI. We leverage Red Hat Enterprise Linux as our primary operating system due to its stability and security features. Network configuration is handled centrally, ensuring consistent performance.

Hardware Specifications

The cluster consists of three primary node types: Master Nodes, Compute Nodes, and Storage Nodes. Each node type is configured with specific hardware to optimize its role within the cluster.

Node Type	CPU	Memory	Storage	Network Interface
Master Nodes (2)	2 x Intel Xeon Gold 6338	256 GB DDR4 ECC	2 x 1 TB NVMe SSD (RAID 1)	100 Gbps Ethernet
Compute Nodes (10)	2 x AMD EPYC 7763	512 GB DDR4 ECC	4 x 4 TB NVMe SSD (RAID 0)	200 Gbps InfiniBand
Storage Nodes (3)	2 x Intel Xeon Silver 4310	128 GB DDR4 ECC	16 x 16 TB SAS HDD (RAID 6)	100 Gbps Ethernet

Software Stack

The software stack is designed to provide a robust and flexible environment for AI development. We utilize a combination of open-source tools and proprietary software. Containerization with Docker and Kubernetes is central to our deployment strategy.

Component	Version	Purpose
Operating System	Red Hat Enterprise Linux 8.6	Server Base
Kubernetes	v1.24.3	Container Orchestration
Docker	20.10.12	Containerization
NVIDIA CUDA Toolkit	11.7	GPU Programming
TensorFlow	2.9.1	Machine Learning Framework
PyTorch	1.12.1	Deep Learning Framework
JupyterHub	3.0.0	Interactive Computing Environment

Network Topology

The network is a critical component of the cluster, providing high-bandwidth, low-latency communication between nodes. The network is segmented into three subnets: one for the Master Nodes, one for the Compute Nodes, and one for the Storage Nodes. Firewall configuration is managed centrally to ensure security.

Subnet	IP Range	Nodes
Master	192.168.1.0/24	Master Node 1, Master Node 2
Compute	192.168.2.0/24	Compute Node 1 - 10
Storage	192.168.3.0/24	Storage Node 1 - 3

Security Considerations

Security is paramount. We employ multiple layers of security, including:

Firewall rules to restrict network access.
Regular security audits and vulnerability scans.
Strong authentication and authorization mechanisms.
Data encryption at rest and in transit. Data backup procedures are also in place.
Intrusion detection systems monitor for malicious activity.

Monitoring and Alerting

The cluster is continuously monitored using Prometheus and Grafana. Alerts are configured to notify administrators of any issues, such as high CPU usage, memory exhaustion, or disk failures. Log analysis is done using the ELK stack. We also use Nagios for basic server monitoring.

Future Enhancements

Planned upgrades include:

Adding more GPU-accelerated Compute Nodes.
Implementing a more advanced storage solution with NVMe-oF.
Integrating with a cloud-based object storage service. Scalability testing will be performed following any hardware changes.
Exploring the use of serverless computing for certain AI workloads.

Cluster maintenance is scheduled monthly to ensure the ongoing stability and performance of the system. Please refer to the troubleshooting guide for common issues.

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️