AI in Manchester
AI in Manchester: Server Configuration
Welcome to the documentation for the "AI in Manchester" server cluster. This article details the hardware and software configuration powering our Artificial Intelligence initiatives within the Manchester region. This guide is intended for new system administrators and developers joining the project. It provides a detailed overview of the server infrastructure, including hardware specifications, software stack, and networking details. Please review this document carefully before making any changes to the system.
Overview
The "AI in Manchester" project utilizes a distributed server cluster to handle the computational demands of machine learning model training and inference. The cluster is geographically located within a secure data centre in central Manchester. It is comprised of a mix of high-performance compute nodes, storage servers, and network infrastructure. This allows us to efficiently process large datasets and deploy AI models at scale. We utilize a hybrid cloud approach, leveraging on-premise resources for sensitive data and cloud bursting for peak demand. This setup is detailed in Data Security Protocols.
Hardware Configuration
The server cluster consists of the following primary hardware components. Detailed specifications for each node type are provided in the tables below. All servers are rack-mounted and utilize redundant power supplies and cooling systems for high availability. See also Power Redundancy.
Compute Nodes
These nodes are responsible for the core AI processing tasks. They are equipped with powerful GPUs and large amounts of RAM.
Component | Specification |
---|---|
CPU | Dual Intel Xeon Gold 6338 (32 Cores/64 Threads per CPU) |
RAM | 512GB DDR4 ECC Registered 3200MHz |
GPU | 4x NVIDIA A100 80GB PCIe 4.0 |
Storage (Local) | 2TB NVMe PCIe 4.0 SSD (OS & Temp Data) |
Network Interface | Dual 200Gbps InfiniBand |
We currently have 24 compute nodes, managed through Slurm Workload Manager. Regular hardware health checks are performed as outlined in Server Maintenance Schedule.
Storage Servers
These servers provide persistent storage for datasets, model checkpoints, and other critical data.
Component | Specification |
---|---|
CPU | Dual Intel Xeon Silver 4310 (12 Cores/24 Threads per CPU) |
RAM | 256GB DDR4 ECC Registered 3200MHz |
Storage (Raw) | 16 x 18TB SAS 7.2K RPM HDDs (RAID 6) - Total 200TB usable |
Network Interface | Dual 100Gbps Ethernet |
File System | Ceph |
The storage servers utilize a Ceph distributed file system for scalability and resilience. See Ceph Configuration Guide for more information. A dedicated backup system is detailed in Backup and Disaster Recovery.
Network Infrastructure
The network infrastructure provides high-bandwidth, low-latency connectivity between the servers.
Component | Specification |
---|---|
Core Switches | Arista 7050X Series |
Interconnect | 400Gbps Fiber Optic |
Network Topology | Clos Network |
Firewall | Palo Alto Networks PA-820 |
Network security is paramount. Refer to Network Security Policy for detailed information.
Software Configuration
The "AI in Manchester" cluster runs a customized Linux distribution based on Ubuntu 22.04 LTS. The following software components are installed on each node.
- Operating System: Ubuntu 22.04 LTS
- Containerization: Docker and Kubernetes are used for deploying and managing AI applications. See Kubernetes Deployment Guide.
- Machine Learning Frameworks: TensorFlow, PyTorch, and scikit-learn are pre-installed and optimized for the GPU hardware. Specific versioning is tracked in Software Version Control.
- Programming Languages: Python 3.9 is the primary programming language.
- Monitoring: Prometheus and Grafana are used for system monitoring and alerting. Detailed monitoring dashboards are available at Monitoring Dashboard Link.
- Version Control: Git is used for all code management, with repositories hosted on GitLab Instance.
- Data Processing: Apache Spark is used for large-scale data processing and ETL tasks.
Security Considerations
Security is a top priority for the "AI in Manchester" project. Access to the server cluster is strictly controlled through SSH key-based authentication and multi-factor authentication. Regular security audits are conducted as described in Security Audit Reports. All data is encrypted at rest and in transit. Please familiarize yourself with the Data Governance Policy.
Future Expansion
We anticipate expanding the cluster in the next quarter to include additional compute nodes with the latest generation of GPUs. This expansion will be documented in a separate article. See Future Expansion Plans.
Main Page Contact Support Troubleshooting Guide Glossary of Terms Server Documentation Index
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️