Server rental store

AI in Margate

AI in Margate: Server Configuration Documentation

Welcome to the documentation for the "AI in Margate" server configuration. This article details the hardware and software setup for our artificial intelligence research and development environment located in Margate. This guide is intended for newcomers to the wiki and those responsible for maintaining the system. Understanding these configurations is crucial for troubleshooting, upgrades, and ensuring optimal performance. Please read this document carefully before making any changes to the server environment.

Overview

The “AI in Margate” project requires significant computational resources. The server infrastructure is designed for high-throughput processing of large datasets, model training, and real-time inference. The system is built around a cluster of dedicated servers, interconnected via a high-speed network. This document will cover the key components, including hardware specifications, software stack, and network configuration. We will also cover basic system administration procedures. Refer to the System Administration Guide for more in-depth information on general server maintenance procedures. Understanding Network Topology is also vital.

Hardware Specifications

The core of our AI infrastructure consists of three primary server types: Compute Nodes, Storage Nodes, and a Management Node. Each node type has specific hardware requirements to optimise its function.

Compute Nodes (AI-CN01, AI-CN02, AI-CN03, AI-CN04): These nodes are responsible for the heavy lifting of model training and inference.

Component Specification
CPU 2 x Intel Xeon Gold 6338 (32 cores/64 threads per CPU)
RAM 512 GB DDR4 ECC Registered @ 3200MHz
GPU 4 x NVIDIA A100 80GB PCIe 4.0
Storage (Local) 2 x 1.92TB NVMe PCIe 4.0 SSD (RAID 0)
Network Interface 2 x 100GbE Mellanox ConnectX-6
Power Supply 2 x 2000W Redundant Power Supplies

Storage Nodes (AI-SN01, AI-SN02): These nodes provide the persistent storage for datasets and model checkpoints.

Component Specification
CPU 2 x Intel Xeon Silver 4310 (12 cores/24 threads per CPU)
RAM 256 GB DDR4 ECC Registered @ 3200MHz
Storage (RAID) 16 x 16TB SAS 7.2K RPM HDD (RAID 6 - providing approximately 192TB usable storage)
Network Interface 2 x 40GbE Mellanox ConnectX-5
Power Supply 2 x 1600W Redundant Power Supplies

Management Node (AI-MN01): This node handles system monitoring, user authentication, and job scheduling.

Component Specification
CPU 2 x Intel Xeon E-2336 (8 cores/16 threads per CPU)
RAM 64 GB DDR4 ECC Registered @ 3200MHz
Storage 2 x 480GB SATA SSD (RAID 1)
Network Interface 2 x 1GbE Intel Ethernet
Power Supply 1 x 850W Power Supply

Refer to the Hardware Inventory for a complete list of all serial numbers and asset tags.

Software Stack

The operating system across all nodes is Ubuntu Server 22.04 LTS. The software stack is carefully selected to support our AI workflows. See the Software List for licensing details.

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️