AI in Preston

AI in Preston: Server Configuration Documentation

Welcome to the documentation for the "AI in Preston" server cluster. This document details the hardware and software configuration of the servers supporting our Artificial Intelligence initiatives within the Preston data centre. This guide is aimed at newcomers to the wiki and server administration tasks. Please read carefully before attempting any modifications.

Overview

The "AI in Preston" project utilizes a distributed server architecture to handle the intensive computational demands of machine learning model training and inference. The cluster is designed for scalability and redundancy, employing a combination of high-performance compute nodes and dedicated storage servers. This documentation covers the core components and their configurations. We will cover the network topology, compute nodes, storage infrastructure, and software stack. Be sure to read the Server Access Policy before attempting to connect to any of these servers. Familiarize yourself with the Data Backup Procedures as well.

Network Topology

The server cluster is deployed within a dedicated VLAN at the Preston data centre. The network is segmented to isolate AI traffic from other services. Key network components include:

A core switch providing high-bandwidth connectivity between all servers.
A dedicated management network for out-of-band server administration.
A separate storage network for communication with the network-attached storage (NAS) devices.

Below is a summary of the network configuration. Refer to the Network Diagram for a visual representation.

Component	IP Address	Subnet Mask	Gateway
Core Switch	192.168.10.1	255.255.255.0	192.168.10.254
Management Network Gateway	10.0.0.1	255.255.255.0	N/A
Storage Network Gateway	172.16.0.1	255.255.255.0	N/A

Compute Nodes

The compute nodes are responsible for performing the majority of the AI workload. They are equipped with high-end GPUs and large amounts of RAM. Each node runs a lightweight Linux distribution optimized for machine learning. See the Operating System Standard for more details. Currently, we have 8 compute nodes, designated `ai-preston-compute-01` through `ai-preston-compute-08`. Before running any jobs, please consult the Job Scheduling Policy.

Here's a detailed breakdown of the compute node specifications:

Specification	Value
CPU	Intel Xeon Gold 6338
RAM	256 GB DDR4 ECC
GPU	NVIDIA A100 (80GB) x 4
Storage (Local)	1 TB NVMe SSD
Network Interface	100 Gbps Ethernet
Operating System	Ubuntu 22.04 LTS (Custom Kernel)

Storage Infrastructure

The storage infrastructure provides persistent storage for datasets, model checkpoints, and other AI-related data. We utilize a Network Attached Storage (NAS) solution with high availability and redundancy. The NAS is managed by the Storage Administration Team. All data is backed up daily according to the Data Backup Procedures.

The following table details the NAS configuration:

Specification	Value
NAS Model	NetApp FAS8200
Total Capacity	1 PB
RAID Level	RAID 6
File System	XFS
Network Protocol	NFSv4
Access Control	ACLs

Software Stack

The software stack includes the core machine learning frameworks, libraries, and tools used by the AI team. All software is managed via Software Package Management and is regularly updated to ensure security and stability. We primarily use Python as the programming language, along with the following libraries:

TensorFlow
PyTorch
scikit-learn
pandas
numpy

The servers also include a containerization platform (Docker) for managing dependencies and ensuring reproducibility. Please refer to the Docker Usage Guidelines for details.

Security Considerations

Security is paramount. All servers are protected by a firewall and intrusion detection system. Access to the servers is restricted to authorized personnel only. Regular security audits are conducted by the Security Team. Please report any security vulnerabilities immediately. Review the Security Incident Response Plan.

Future Enhancements

Planned future enhancements include:

Upgrading the network infrastructure to 200 Gbps Ethernet.
Adding more GPU-powered compute nodes.
Implementing a distributed file system for improved scalability.

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️