AI in Sheffield

From Server rental store
Jump to navigation Jump to search
  1. AI in Sheffield: Server Configuration

This article details the server configuration powering the "AI in Sheffield" project, a local initiative focused on applying artificial intelligence to urban challenges. This document is intended for new system administrators and developers contributing to the project. It outlines the hardware, software, and networking components, offering a comprehensive overview of the infrastructure.

Overview

The "AI in Sheffield" project relies on a cluster of servers located in a secure data center within the University of Sheffield. These servers are responsible for data ingestion, model training, inference, and serving a web-based user interface. The system is designed for scalability, reliability, and security. We utilize a hybrid cloud approach, with some components hosted on-premises and others leveraging cloud services for peak demand. System Architecture provides a high-level diagram of the entire system.

Hardware Configuration

The core of the AI infrastructure consists of four primary server types: data storage servers, compute servers, database servers, and web servers. Each server type has a specific role and configuration optimized for its tasks.

Server Type Quantity CPU RAM Storage Network Interface
Data Storage Server 2 Intel Xeon Gold 6248R (24 cores) 256 GB DDR4 ECC 16 x 16TB SAS HDD (RAID 6) 10 GbE
Compute Server (GPU) 4 AMD EPYC 7763 (64 cores) 512 GB DDR4 ECC 2 x NVIDIA A100 (80GB) 100 GbE
Database Server 2 Intel Xeon Silver 4210 (10 cores) 128 GB DDR4 ECC 2 x 1TB NVMe SSD (RAID 1) 1 GbE
Web Server 3 Intel Core i7-10700K (8 cores) 64 GB DDR4 1TB NVMe SSD 1 GbE

Detailed specifications for each server, including serial numbers and asset tags, are maintained in the Hardware Inventory. Power consumption is carefully monitored using a dedicated Power Distribution Unit (PDU).

Software Configuration

The software stack is built around a Linux foundation (Ubuntu Server 22.04 LTS). We employ containerization using Docker and orchestration with Kubernetes for managing application deployments and scaling.

Component Version Purpose
Operating System Ubuntu Server 22.04 LTS Base OS for all servers
Docker Engine 20.10.21 Containerization platform
Kubernetes 1.24.0 Container orchestration
PostgreSQL 14.5 Primary database for metadata and application data
TensorFlow 2.9.1 Machine learning framework
PyTorch 1.12.1 Machine learning framework
Nginx 1.21.6 Web server and reverse proxy

The specific versions of software packages are tracked in the Software Bill of Materials (SBOM). All code is version controlled using Git and hosted on a private GitLab instance. Regular security updates are applied using Ansible for automated configuration management.

Networking Configuration

The server cluster is connected to the University of Sheffield’s network via a dedicated VLAN. A firewall protects the cluster from external threats. Internally, a software-defined network (SDN) manages traffic flow between servers.

Parameter Value
VLAN ID 1001
Subnet Mask 255.255.255.0
Gateway 192.168.1.1
DNS Servers 8.8.8.8, 8.8.4.4
Firewall pfSense 2.5.2
SDN Controller ONOS

Network diagrams and configuration details are available in the Network Documentation. We utilize SSH keys for secure remote access to the servers. All network traffic is logged for auditing and security analysis using ELK Stack. The system utilizes a Load Balancer to distribute traffic across web servers.


Security Considerations

Security is paramount. Access to the server cluster is restricted to authorized personnel only. Multi-factor authentication is enforced for all access methods. Regular security audits are conducted to identify and address vulnerabilities. All data is encrypted both in transit and at rest. Security Policy outlines the complete security measures in place.

Future Expansion

We anticipate expanding the cluster to accommodate growing data volumes and increasing computational demands. Future plans include the addition of more GPU servers and the implementation of a distributed file system. Further details can be found in the Capacity Planning Document. We are also investigating the use of Federated Learning to leverage data from other institutions.


Main Page Data Pipelines Model Deployment Monitoring and Alerting Troubleshooting Guide Contact Information


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️