AI in Oxford
AI in Oxford: Server Configuration
Welcome to the guide on the server configuration powering the "AI in Oxford" project. This article details the hardware and software setup responsible for running the various artificial intelligence applications and research initiatives housed within the department. This is intended as a technical reference for system administrators and developers working with the platform. This version covers the configuration as of October 26, 2023. Please refer to the Change Log for updates.
Overview
The "AI in Oxford" infrastructure is designed for high-throughput computing, large dataset storage, and rapid model training. It employs a hybrid architecture, leveraging both on-premise servers and cloud resources. The core on-premise cluster consists of a network of interconnected servers, each specialized for specific tasks. Security Considerations are paramount, with robust access controls and data encryption implemented throughout the system. The system is monitored using Nagios Monitoring and integrated with Incident Management.
Hardware Specifications
The primary compute nodes are based on the following specifications:
Component | Specification | Quantity |
---|---|---|
CPU | Intel Xeon Gold 6338 (32 cores, 64 threads) | 16 |
RAM | 256 GB DDR4 ECC Registered RAM | 16 |
GPU | NVIDIA A100 (80GB) | 8 |
Storage (Local) | 4 TB NVMe PCIe Gen4 SSD | 16 |
Network Interface | 2 x 100 GbE Mellanox ConnectX-6 | 16 |
Storage is handled by a dedicated network-attached storage (NAS) cluster.
Component | Specification | Quantity |
---|---|---|
Storage Type | Seagate Exos X18 18TB SAS HDD | 64 |
RAID Level | RAID 6 | N/A |
File System | ZFS | N/A |
Network Interface | 4 x 40 GbE Mellanox ConnectX-5 | 2 |
Total Usable Capacity | ~800 TB | N/A |
Finally, the front-end servers are slightly less powerful, serving as access points and managing user authentication.
Component | Specification | Quantity |
---|---|---|
CPU | Intel Xeon E-2388G (8 cores, 16 threads) | 4 |
RAM | 64 GB DDR4 ECC Registered RAM | 4 |
Storage (Local) | 1 TB NVMe PCIe Gen3 SSD | 4 |
Network Interface | 2 x 10 GbE Intel X710 | 4 |
Software Stack
The servers run a customized distribution of Ubuntu Server 22.04. The core software components include:
- Operating System: Ubuntu Server 22.04 LTS
- Containerization: Docker and Kubernetes for application deployment and orchestration.
- Programming Languages: Python 3.10 is the primary language, with support for R and Julia.
- Machine Learning Frameworks: TensorFlow, PyTorch, and Scikit-learn.
- Data Processing: Apache Spark and Hadoop for large-scale data processing.
- Database: PostgreSQL with PostGIS extension for geospatial data.
- Version Control: Git with GitHub for code management.
- API Gateway: Kong for managing API access.
Networking Configuration
The server cluster is connected via a dedicated 100 GbE network. A virtual local area network (VLAN) scheme is implemented to isolate different projects and ensure security. The network topology is a spine-leaf architecture, providing high bandwidth and low latency. Network Diagram provides a visual representation. DNS is managed by an internal BIND9 server.
Security Measures
- Firewall: iptables and fail2ban are used to protect against unauthorized access.
- Intrusion Detection System: Snort monitors network traffic for malicious activity.
- Data Encryption: All sensitive data is encrypted at rest and in transit using TLS/SSL.
- Access Control: Role-based access control (RBAC) is implemented using LDAP.
- Regular Security Audits: The system undergoes regular security audits performed by the IT Security Team.
Future Enhancements
Planned future enhancements include:
- Upgrading to the latest generation of GPUs (NVIDIA H100).
- Expanding the storage capacity with additional NAS units.
- Implementing a more sophisticated monitoring system with predictive analytics.
- Integrating with cloud providers for burst computing and disaster recovery. Cloud Integration documentation is in progress.
- Exploring the use of specialized hardware accelerators for specific AI workloads.
Main Page Server Administration Troubleshooting Guide Contact Support Change Log
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️