AI in Dundee
- AI in Dundee: Server Configuration
This article details the server configuration powering the "AI in Dundee" initiative, providing a technical overview for system administrators and those interested in the infrastructure supporting our research. This document assumes a basic understanding of server administration and networking concepts. We will cover hardware specifications, software stack, and key configuration details.
Overview
The "AI in Dundee" project utilizes a cluster of servers dedicated to machine learning, deep learning, and data analysis tasks. The cluster is designed for scalability and high performance, employing a combination of CPU and GPU resources. The servers are located in the University of Dundee's data centre and are interconnected via a high-speed network. This infrastructure supports a variety of projects, including natural language processing, computer vision, and predictive modelling. Please refer to the Data Centre Access Policy for physical access information.
Hardware Specifications
The cluster consists of four primary server nodes, each with similar specifications. Detailed information is provided below.
Server Node | CPU | RAM | Storage | GPU |
---|---|---|---|---|
Node 1 | 2 x Intel Xeon Gold 6248R @ 3.0GHz | 256GB DDR4 ECC REG | 2 x 4TB NVMe SSD (RAID 1) | 2 x NVIDIA GeForce RTX 3090 (24GB VRAM) |
Node 2 | 2 x Intel Xeon Gold 6248R @ 3.0GHz | 256GB DDR4 ECC REG | 2 x 4TB NVMe SSD (RAID 1) | 2 x NVIDIA GeForce RTX 3090 (24GB VRAM) |
Node 3 | 2 x Intel Xeon Gold 6248R @ 3.0GHz | 256GB DDR4 ECC REG | 2 x 4TB NVMe SSD (RAID 1) | 2 x NVIDIA GeForce RTX 3090 (24GB VRAM) |
Node 4 | 2 x Intel Xeon Gold 6248R @ 3.0GHz | 256GB DDR4 ECC REG | 2 x 4TB NVMe SSD (RAID 1) | 2 x NVIDIA GeForce RTX 3090 (24GB VRAM) |
These servers are networked using 100GbE Mellanox ConnectX-6 adapters. See the Network Topology Diagram for further details. Power consumption is monitored via Power Distribution Units (PDUs).
Software Stack
The servers run Ubuntu Server 22.04 LTS. The core software stack includes:
- Operating System: Ubuntu Server 22.04 LTS
- Containerization: Docker and Kubernetes
- Machine Learning Frameworks: TensorFlow, PyTorch, scikit-learn
- Programming Languages: Python 3.9, R 4.2
- Data Storage: Ceph (distributed file system)
- Monitoring: Prometheus and Grafana
The Ceph cluster provides a scalable and resilient storage solution for the project’s data. Configuration details for Ceph can be found in the Ceph Configuration Document. Docker and Kubernetes are used for deploying and managing machine learning applications. For information on deploying applications, consult the Kubernetes Deployment Guide.
Network Configuration
The server nodes are connected to a dedicated VLAN for security and performance. Key network settings are outlined below.
Parameter | Value |
---|---|
VLAN ID | 100 |
Subnet Mask | 255.255.255.0 |
Gateway | 192.168.100.1 |
DNS Servers | 8.8.8.8, 8.8.4.4 |
Network Interface | enp4s0 |
Firewall rules are managed using `ufw` and are configured to allow only necessary traffic. The Firewall Ruleset document details the current configuration. Access to the servers is restricted to authorized personnel via SSH with key-based authentication. See the SSH Key Management Policy.
Security Considerations
Security is paramount for the "AI in Dundee" infrastructure. The following measures are in place:
- Regular security audits
- Intrusion detection system (IDS)
- Firewall protection
- Data encryption at rest and in transit
- Multi-factor authentication for administrative access
All data is backed up regularly to an offsite location. The Backup and Recovery Plan provides detailed information on the backup process. Security incidents should be reported immediately to the IT Security Team.
Future Expansion
We plan to expand the cluster with additional servers in the coming months. These new servers will feature the latest generation of GPUs and CPUs to further enhance the cluster's performance. We are also evaluating the use of NVLink technology to improve communication between GPUs. Details of the planned expansion are documented in the Future Infrastructure Roadmap. The servers will be integrated into the existing Monitoring Dashboard. We are also considering the use of a more advanced scheduling system, like Slurm, described in the Slurm Workload Manager documentation.
Main Page Server Administration Data Storage Solutions Network Security GPU Configuration
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️