Server rental store

AI in Leeds

AI in Leeds: Server Configuration

Welcome to the documentation for the "AI in Leeds" server cluster. This article details the hardware and software configuration powering our Artificial Intelligence initiatives within the Leeds data centre. This guide is aimed at newcomers to the wiki and those needing detailed information about the server infrastructure. Understanding this configuration is crucial for System Administrators and Developers working with AI models hosted on this cluster.

Overview

The "AI in Leeds" cluster is a dedicated environment designed to handle the computational demands of machine learning, deep learning, and natural language processing tasks. It comprises a network of high-performance servers interconnected via a low-latency network. The primary goal of this infrastructure is to provide a scalable and reliable platform for research and development in AI. We leverage Red Hat Enterprise Linux as our primary operating system due to its stability and security features. Network configuration is handled centrally, ensuring consistent performance.

Hardware Specifications

The cluster consists of three primary node types: Master Nodes, Compute Nodes, and Storage Nodes. Each node type is configured with specific hardware to optimize its role within the cluster.

Node Type CPU Memory Storage Network Interface
Master Nodes (2) 2 x Intel Xeon Gold 6338 256 GB DDR4 ECC 2 x 1 TB NVMe SSD (RAID 1) 100 Gbps Ethernet
Compute Nodes (10) 2 x AMD EPYC 7763 512 GB DDR4 ECC 4 x 4 TB NVMe SSD (RAID 0) 200 Gbps InfiniBand
Storage Nodes (3) 2 x Intel Xeon Silver 4310 128 GB DDR4 ECC 16 x 16 TB SAS HDD (RAID 6) 100 Gbps Ethernet

Software Stack

The software stack is designed to provide a robust and flexible environment for AI development. We utilize a combination of open-source tools and proprietary software. Containerization with Docker and Kubernetes is central to our deployment strategy.

Component Version Purpose
Operating System Red Hat Enterprise Linux 8.6 Server Base
Kubernetes v1.24.3 Container Orchestration
Docker 20.10.12 Containerization
NVIDIA CUDA Toolkit 11.7 GPU Programming
TensorFlow 2.9.1 Machine Learning Framework
PyTorch 1.12.1 Deep Learning Framework
JupyterHub 3.0.0 Interactive Computing Environment

Network Topology

The network is a critical component of the cluster, providing high-bandwidth, low-latency communication between nodes. The network is segmented into three subnets: one for the Master Nodes, one for the Compute Nodes, and one for the Storage Nodes. Firewall configuration is managed centrally to ensure security.

Subnet IP Range Nodes
Master 192.168.1.0/24 Master Node 1, Master Node 2
Compute 192.168.2.0/24 Compute Node 1 - 10
Storage 192.168.3.0/24 Storage Node 1 - 3

Security Considerations

Security is paramount. We employ multiple layers of security, including:

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️