AI in Preston
AI in Preston: Server Configuration Documentation
Welcome to the documentation for the "AI in Preston" server cluster. This document details the hardware and software configuration of the servers supporting our Artificial Intelligence initiatives within the Preston data centre. This guide is aimed at newcomers to the wiki and server administration tasks. Please read carefully before attempting any modifications.
Overview
The "AI in Preston" project utilizes a distributed server architecture to handle the intensive computational demands of machine learning model training and inference. The cluster is designed for scalability and redundancy, employing a combination of high-performance compute nodes and dedicated storage servers. This documentation covers the core components and their configurations. We will cover the network topology, compute nodes, storage infrastructure, and software stack. Be sure to read the Server Access Policy before attempting to connect to any of these servers. Familiarize yourself with the Data Backup Procedures as well.
Network Topology
The server cluster is deployed within a dedicated VLAN at the Preston data centre. The network is segmented to isolate AI traffic from other services. Key network components include:
- A core switch providing high-bandwidth connectivity between all servers.
- A dedicated management network for out-of-band server administration.
- A separate storage network for communication with the network-attached storage (NAS) devices.
Below is a summary of the network configuration. Refer to the Network Diagram for a visual representation.
Component | IP Address | Subnet Mask | Gateway |
---|---|---|---|
Core Switch | 192.168.10.1 | 255.255.255.0 | 192.168.10.254 |
Management Network Gateway | 10.0.0.1 | 255.255.255.0 | N/A |
Storage Network Gateway | 172.16.0.1 | 255.255.255.0 | N/A |
Compute Nodes
The compute nodes are responsible for performing the majority of the AI workload. They are equipped with high-end GPUs and large amounts of RAM. Each node runs a lightweight Linux distribution optimized for machine learning. See the Operating System Standard for more details. Currently, we have 8 compute nodes, designated `ai-preston-compute-01` through `ai-preston-compute-08`. Before running any jobs, please consult the Job Scheduling Policy.
Here's a detailed breakdown of the compute node specifications:
Specification | Value |
---|---|
CPU | Intel Xeon Gold 6338 |
RAM | 256 GB DDR4 ECC |
GPU | NVIDIA A100 (80GB) x 4 |
Storage (Local) | 1 TB NVMe SSD |
Network Interface | 100 Gbps Ethernet |
Operating System | Ubuntu 22.04 LTS (Custom Kernel) |
Storage Infrastructure
The storage infrastructure provides persistent storage for datasets, model checkpoints, and other AI-related data. We utilize a Network Attached Storage (NAS) solution with high availability and redundancy. The NAS is managed by the Storage Administration Team. All data is backed up daily according to the Data Backup Procedures.
The following table details the NAS configuration:
Specification | Value |
---|---|
NAS Model | NetApp FAS8200 |
Total Capacity | 1 PB |
RAID Level | RAID 6 |
File System | XFS |
Network Protocol | NFSv4 |
Access Control | ACLs |
Software Stack
The software stack includes the core machine learning frameworks, libraries, and tools used by the AI team. All software is managed via Software Package Management and is regularly updated to ensure security and stability. We primarily use Python as the programming language, along with the following libraries:
- TensorFlow
- PyTorch
- scikit-learn
- pandas
- numpy
The servers also include a containerization platform (Docker) for managing dependencies and ensuring reproducibility. Please refer to the Docker Usage Guidelines for details.
Security Considerations
Security is paramount. All servers are protected by a firewall and intrusion detection system. Access to the servers is restricted to authorized personnel only. Regular security audits are conducted by the Security Team. Please report any security vulnerabilities immediately. Review the Security Incident Response Plan.
Future Enhancements
Planned future enhancements include:
- Upgrading the network infrastructure to 200 Gbps Ethernet.
- Adding more GPU-powered compute nodes.
- Implementing a distributed file system for improved scalability.
Related Documentation
- Server Access Policy
- Data Backup Procedures
- Operating System Standard
- Job Scheduling Policy
- Network Diagram
- Software Package Management
- Docker Usage Guidelines
- Storage Administration Team Contact Information
- Security Team Contact Information
- Security Incident Response Plan
- Troubleshooting Guide
- Monitoring Dashboard Link
- AI Project Documentation Hub
- Change Management Process
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️