AI in Cardiff

From Server rental store
Jump to navigation Jump to search
  1. AI in Cardiff: Server Configuration

This document details the server configuration supporting the "AI in Cardiff" project. It is intended for new system administrators and developers working with this infrastructure. The project leverages a cluster of servers to deliver machine learning services and data analysis capabilities. Understanding the specifics of these servers is crucial for effective maintenance, troubleshooting, and future scaling. Please review the System Administration Guide before making any changes.

Overview

The "AI in Cardiff" project relies on a hybrid server infrastructure, utilizing both physical servers and virtual machines. The physical servers handle computationally intensive tasks, while virtual machines provide flexibility for development, testing, and less demanding workloads. All servers are located within the secure data center at Cardiff University. See the Data Center Access Policy for details. The network topology is documented in the Network Diagram.

Physical Server Specifications

The core of the AI processing power comes from three dedicated physical servers, named 'Alys', 'Rhys', and 'Idris'. These servers are optimized for GPU-accelerated computing.

Server Name CPU RAM GPU Storage
Alys 2 x Intel Xeon Gold 6248R (24 cores/48 threads per CPU) 512 GB DDR4 ECC REG 4 x NVIDIA A100 (80GB) 8 x 4TB NVMe SSD (RAID 0)
Rhys 2 x Intel Xeon Gold 6248R (24 cores/48 threads per CPU) 512 GB DDR4 ECC REG 4 x NVIDIA A100 (80GB) 8 x 4TB NVMe SSD (RAID 0)
Idris 2 x AMD EPYC 7763 (64 cores/128 threads per CPU) 1TB DDR4 ECC REG 8 x NVIDIA A100 (80GB) 8 x 8TB NVMe SSD (RAID 0)

These servers operate on a custom-built Linux distribution based on Ubuntu Server 22.04. The servers are connected via a 100Gbps InfiniBand network. Refer to the InfiniBand Configuration Guide for network details. Monitoring is handled by Prometheus and Grafana.

Virtual Machine Configuration

A cluster of virtual machines (VMs) is managed using Proxmox VE. These VMs are used for a variety of tasks, including development, testing, and data pre-processing.

VM Name CPU RAM Storage Operating System
dev-1 8 vCPUs 32 GB 500 GB SSD Ubuntu 22.04
test-1 4 vCPUs 16 GB 250 GB SSD Ubuntu 22.04
data-prep-1 16 vCPUs 64 GB 1TB SSD CentOS 7
model-serving-1 8 vCPUs 32 GB 500 GB SSD Ubuntu 22.04

Each VM has access to the same InfiniBand network as the physical servers, allowing for high-speed data transfer. VMs are backed up daily using the Backup and Recovery Plan.

Software Stack

The software stack deployed on these servers is critical to the project's success. Key components include:

Security Considerations

Security is paramount. All servers are protected by a firewall and intrusion detection system. Access is restricted to authorized personnel only. Regular security audits are conducted, as outlined in the Security Policy. All data is encrypted at rest and in transit. See the Encryption Protocol document for details.

Future Expansion

The "AI in Cardiff" project is expected to grow significantly in the coming years. Plans are underway to add additional physical servers and virtual machines. The next phase of expansion will focus on increasing GPU capacity and storage capacity. The Capacity Planning Document outlines the details of this expansion.

Server Monitoring Troubleshooting Guide Software Updates User Accounts Incident Response Plan


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️