Server rental store

AI in Wales

AI in Wales: Server Configuration Documentation

Welcome to the documentation for the “AI in Wales” server infrastructure. This article details the server configuration used to support our artificial intelligence initiatives within the Wales-based research network. This guide is intended for new system administrators and developers joining the project. Please read carefully to understand the system architecture and required configurations.

Overview

The "AI in Wales" project utilizes a distributed server cluster to handle the computational demands of machine learning model training, inference, and data storage. The core infrastructure is built around a combination of high-performance compute nodes and a robust storage system. The system is designed for scalability and redundancy, ensuring high availability and data integrity. This document will detail the hardware, software, and network configuration of these servers. We utilize a hybrid approach, leveraging both on-premise servers and cloud resources through AWS.

Hardware Specifications

The server cluster consists of three main types of nodes: Compute Nodes, Storage Nodes, and a Management Node. Detailed specifications for each are provided below.

Compute Node Specifications Value
CPU | 2 x Intel Xeon Gold 6338
RAM | 512 GB DDR4 ECC Registered
GPU | 4 x NVIDIA A100 80GB
Storage (Local) | 2 x 1.92 TB NVMe SSD (RAID 0)
Network Interface | 2 x 100 Gbps InfiniBand, 1 x 10 Gbps Ethernet
Operating System | Ubuntu 22.04 LTS

Storage Node Specifications Value
CPU | 2 x Intel Xeon Silver 4310
RAM | 256 GB DDR4 ECC Registered
Storage (Total) | 1.2 PB Raw Capacity (Distributed across multiple drives)
Storage Type | SAS 7.2K RPM
RAID Configuration | RAID 6
Network Interface | 2 x 40 Gbps Ethernet
Operating System | CentOS 8 Stream

Management Node Specifications Value
CPU | 2 x Intel Xeon E-2388G
RAM | 64 GB DDR4 ECC Registered
Storage | 2 x 1 TB SATA SSD (RAID 1)
Network Interface | 1 x 10 Gbps Ethernet
Operating System | Debian 11

Software Configuration

The software stack is crucial for enabling the AI workloads. We utilize a containerized environment managed by Kubernetes for deploying and scaling applications.

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️