AI in Manchester

From Server rental store
Jump to navigation Jump to search

AI in Manchester: Server Configuration

Welcome to the documentation for the "AI in Manchester" server cluster. This article details the hardware and software configuration powering our Artificial Intelligence initiatives within the Manchester region. This guide is intended for new system administrators and developers joining the project. It provides a detailed overview of the server infrastructure, including hardware specifications, software stack, and networking details. Please review this document carefully before making any changes to the system.

Overview

The "AI in Manchester" project utilizes a distributed server cluster to handle the computational demands of machine learning model training and inference. The cluster is geographically located within a secure data centre in central Manchester. It is comprised of a mix of high-performance compute nodes, storage servers, and network infrastructure. This allows us to efficiently process large datasets and deploy AI models at scale. We utilize a hybrid cloud approach, leveraging on-premise resources for sensitive data and cloud bursting for peak demand. This setup is detailed in Data Security Protocols.

Hardware Configuration

The server cluster consists of the following primary hardware components. Detailed specifications for each node type are provided in the tables below. All servers are rack-mounted and utilize redundant power supplies and cooling systems for high availability. See also Power Redundancy.

Compute Nodes

These nodes are responsible for the core AI processing tasks. They are equipped with powerful GPUs and large amounts of RAM.

Component Specification
CPU Dual Intel Xeon Gold 6338 (32 Cores/64 Threads per CPU)
RAM 512GB DDR4 ECC Registered 3200MHz
GPU 4x NVIDIA A100 80GB PCIe 4.0
Storage (Local) 2TB NVMe PCIe 4.0 SSD (OS & Temp Data)
Network Interface Dual 200Gbps InfiniBand

We currently have 24 compute nodes, managed through Slurm Workload Manager. Regular hardware health checks are performed as outlined in Server Maintenance Schedule.

Storage Servers

These servers provide persistent storage for datasets, model checkpoints, and other critical data.

Component Specification
CPU Dual Intel Xeon Silver 4310 (12 Cores/24 Threads per CPU)
RAM 256GB DDR4 ECC Registered 3200MHz
Storage (Raw) 16 x 18TB SAS 7.2K RPM HDDs (RAID 6) - Total 200TB usable
Network Interface Dual 100Gbps Ethernet
File System Ceph

The storage servers utilize a Ceph distributed file system for scalability and resilience. See Ceph Configuration Guide for more information. A dedicated backup system is detailed in Backup and Disaster Recovery.

Network Infrastructure

The network infrastructure provides high-bandwidth, low-latency connectivity between the servers.

Component Specification
Core Switches Arista 7050X Series
Interconnect 400Gbps Fiber Optic
Network Topology Clos Network
Firewall Palo Alto Networks PA-820

Network security is paramount. Refer to Network Security Policy for detailed information.

Software Configuration

The "AI in Manchester" cluster runs a customized Linux distribution based on Ubuntu 22.04 LTS. The following software components are installed on each node.

  • Operating System: Ubuntu 22.04 LTS
  • Containerization: Docker and Kubernetes are used for deploying and managing AI applications. See Kubernetes Deployment Guide.
  • Machine Learning Frameworks: TensorFlow, PyTorch, and scikit-learn are pre-installed and optimized for the GPU hardware. Specific versioning is tracked in Software Version Control.
  • Programming Languages: Python 3.9 is the primary programming language.
  • Monitoring: Prometheus and Grafana are used for system monitoring and alerting. Detailed monitoring dashboards are available at Monitoring Dashboard Link.
  • Version Control: Git is used for all code management, with repositories hosted on GitLab Instance.
  • Data Processing: Apache Spark is used for large-scale data processing and ETL tasks.

Security Considerations

Security is a top priority for the "AI in Manchester" project. Access to the server cluster is strictly controlled through SSH key-based authentication and multi-factor authentication. Regular security audits are conducted as described in Security Audit Reports. All data is encrypted at rest and in transit. Please familiarize yourself with the Data Governance Policy.

Future Expansion

We anticipate expanding the cluster in the next quarter to include additional compute nodes with the latest generation of GPUs. This expansion will be documented in a separate article. See Future Expansion Plans.

Main Page Contact Support Troubleshooting Guide Glossary of Terms Server Documentation Index


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️