AI in Merseyside
AI in Merseyside: Server Configuration Guide
Welcome to the Merseyside AI Initiative's server configuration documentation! This guide details the hardware and software setup powering our artificial intelligence projects. It's aimed at newcomers to the wiki and those assisting with server maintenance. Understanding these configurations is vital for successful development and deployment.
Overview
The Merseyside AI Initiative leverages a hybrid server infrastructure, combining on-premise hardware with cloud-based resources. This allows us to balance cost, security, and scalability. This document primarily focuses on the on-premise server cluster located at the Liverpool Science Park. We utilize a distributed computing model, employing several dedicated servers for different tasks: data ingestion, model training, and inference. We also integrate with cloud services for burst capacity and specialized hardware, such as GPUs. See Cloud Integration Overview for details on that aspect.
Hardware Specifications
The core of our on-premise infrastructure consists of three primary server types. These servers are interconnected via a dedicated 10 Gigabit Ethernet network. Power redundancy is provided by a dual-UPS system, and the server room maintains a constant temperature of 22°C with humidity control. Refer to the Data Center Standards page for detailed environmental specifications.
Server Type | Model | CPU | RAM | Storage | Network Interface |
---|---|---|---|---|---|
Data Ingestion Server | Dell PowerEdge R750 | 2 x Intel Xeon Gold 6338 | 256 GB DDR4 ECC | 2 x 4TB NVMe SSD (RAID 1) + 16TB HDD | 10 Gigabit Ethernet |
Model Training Server | Supermicro SuperServer 2029U-TR4 | 2 x AMD EPYC 7763 | 512 GB DDR4 ECC | 4 x 8TB NVMe SSD (RAID 0) | 10/40 Gigabit Ethernet |
Inference Server | HP ProLiant DL380 Gen10 | 2 x Intel Xeon Silver 4310 | 128 GB DDR4 ECC | 1 x 1TB NVMe SSD | 10 Gigabit Ethernet |
Software Stack
All servers run Ubuntu Server 22.04 LTS. We employ a containerized environment using Docker and Kubernetes for application deployment and management. This ensures portability and scalability. We’ve standardized on Python 3.9 for our AI development, alongside libraries like TensorFlow, PyTorch, and scikit-learn. The Software Version Control page documents the precise library versions. All code is hosted on our internal GitLab Instance.
Operating System
- Distribution: Ubuntu Server 22.04 LTS
- Kernel: 5.15.0-76-generic
- Desktop Environment: None (Server - CLI Only)
Containerization
- Docker Version: 20.10.17
- Kubernetes Version: v1.24.3
- Container Registry: Internal GitLab Container Registry (see Container Registry Access)
AI Frameworks
- TensorFlow: 2.9.1
- PyTorch: 1.12.1
- Scikit-learn: 1.1.3
- CUDA Toolkit: 11.6 (for GPU-accelerated training)
Network Configuration
The servers are organized into a private network with static IP addresses. The network is segmented using VLANs to isolate different services and enhance security. A dedicated firewall protects the network from external threats. See the Network Topology Diagram for a visual representation of the network layout.
Server Role | IP Address | VLAN | Firewall Rules |
---|---|---|---|
Data Ingestion Server | 192.168.1.10 | 10 | Allow incoming SSH (restricted IPs), HTTP/HTTPS, Database access |
Model Training Server | 192.168.1.20 | 20 | Allow incoming SSH (restricted IPs), Kubernetes API access |
Inference Server | 192.168.1.30 | 30 | Allow incoming HTTP/HTTPS, gRPC |
Security Considerations
Security is paramount. All servers are hardened according to the Server Hardening Guide. We employ intrusion detection and prevention systems (IDS/IPS) to monitor network traffic for malicious activity. Regular security audits are conducted. Access to servers is restricted to authorized personnel only, utilizing SSH key-based authentication. Data at rest is encrypted using AES-256 encryption. Regular backups are performed and stored offsite. These backups are detailed on the Backup and Recovery Procedures page.
Monitoring and Logging
We utilize Prometheus and Grafana for real-time monitoring of server performance and resource utilization. All server logs are aggregated using the ELK stack (Elasticsearch, Logstash, Kibana) for centralized analysis and troubleshooting. Alerts are configured to notify administrators of critical events. See Monitoring Dashboard Access for details.
Component | Version | Configuration Details |
---|---|---|
Prometheus | 2.37.2 | Configured to scrape metrics from all servers |
Grafana | 8.5.1 | Dashboards for CPU usage, memory usage, disk I/O, network traffic |
Elasticsearch | 7.17.6 | Centralized log storage and indexing |
Logstash | 7.17.6 | Log parsing and filtering |
Kibana | 7.17.6 | Log visualization and analysis |
Future Expansion
We anticipate expanding the cluster with additional GPU-powered servers for more demanding model training tasks. We are also evaluating the use of a high-performance storage system to improve data access speeds. Further details on the planned upgrades can be found on the Future Infrastructure Roadmap page. Consider also reviewing the Hardware Procurement Process if you plan to suggest new equipment.
AI Model Deployment Data Pipeline Architecture Kubernetes Configuration Files Troubleshooting Guide Security Incident Response Plan Network Troubleshooting Server Maintenance Schedule Disaster Recovery Plan Contact Information for Server Support Change Management Procedure Software Licensing Information Data Privacy Policy User Account Management Server Inventory
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️