AI in Merseyside

From Server rental store
Revision as of 07:02, 16 April 2025 by Admin (talk | contribs) (Automated server configuration article)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

AI in Merseyside: Server Configuration Guide

Welcome to the Merseyside AI Initiative's server configuration documentation! This guide details the hardware and software setup powering our artificial intelligence projects. It's aimed at newcomers to the wiki and those assisting with server maintenance. Understanding these configurations is vital for successful development and deployment.

Overview

The Merseyside AI Initiative leverages a hybrid server infrastructure, combining on-premise hardware with cloud-based resources. This allows us to balance cost, security, and scalability. This document primarily focuses on the on-premise server cluster located at the Liverpool Science Park. We utilize a distributed computing model, employing several dedicated servers for different tasks: data ingestion, model training, and inference. We also integrate with cloud services for burst capacity and specialized hardware, such as GPUs. See Cloud Integration Overview for details on that aspect.

Hardware Specifications

The core of our on-premise infrastructure consists of three primary server types. These servers are interconnected via a dedicated 10 Gigabit Ethernet network. Power redundancy is provided by a dual-UPS system, and the server room maintains a constant temperature of 22°C with humidity control. Refer to the Data Center Standards page for detailed environmental specifications.

Server Type Model CPU RAM Storage Network Interface
Data Ingestion Server Dell PowerEdge R750 2 x Intel Xeon Gold 6338 256 GB DDR4 ECC 2 x 4TB NVMe SSD (RAID 1) + 16TB HDD 10 Gigabit Ethernet
Model Training Server Supermicro SuperServer 2029U-TR4 2 x AMD EPYC 7763 512 GB DDR4 ECC 4 x 8TB NVMe SSD (RAID 0) 10/40 Gigabit Ethernet
Inference Server HP ProLiant DL380 Gen10 2 x Intel Xeon Silver 4310 128 GB DDR4 ECC 1 x 1TB NVMe SSD 10 Gigabit Ethernet

Software Stack

All servers run Ubuntu Server 22.04 LTS. We employ a containerized environment using Docker and Kubernetes for application deployment and management. This ensures portability and scalability. We’ve standardized on Python 3.9 for our AI development, alongside libraries like TensorFlow, PyTorch, and scikit-learn. The Software Version Control page documents the precise library versions. All code is hosted on our internal GitLab Instance.

Operating System

  • Distribution: Ubuntu Server 22.04 LTS
  • Kernel: 5.15.0-76-generic
  • Desktop Environment: None (Server - CLI Only)

Containerization

  • Docker Version: 20.10.17
  • Kubernetes Version: v1.24.3
  • Container Registry: Internal GitLab Container Registry (see Container Registry Access)

AI Frameworks

  • TensorFlow: 2.9.1
  • PyTorch: 1.12.1
  • Scikit-learn: 1.1.3
  • CUDA Toolkit: 11.6 (for GPU-accelerated training)

Network Configuration

The servers are organized into a private network with static IP addresses. The network is segmented using VLANs to isolate different services and enhance security. A dedicated firewall protects the network from external threats. See the Network Topology Diagram for a visual representation of the network layout.

Server Role IP Address VLAN Firewall Rules
Data Ingestion Server 192.168.1.10 10 Allow incoming SSH (restricted IPs), HTTP/HTTPS, Database access
Model Training Server 192.168.1.20 20 Allow incoming SSH (restricted IPs), Kubernetes API access
Inference Server 192.168.1.30 30 Allow incoming HTTP/HTTPS, gRPC

Security Considerations

Security is paramount. All servers are hardened according to the Server Hardening Guide. We employ intrusion detection and prevention systems (IDS/IPS) to monitor network traffic for malicious activity. Regular security audits are conducted. Access to servers is restricted to authorized personnel only, utilizing SSH key-based authentication. Data at rest is encrypted using AES-256 encryption. Regular backups are performed and stored offsite. These backups are detailed on the Backup and Recovery Procedures page.

Monitoring and Logging

We utilize Prometheus and Grafana for real-time monitoring of server performance and resource utilization. All server logs are aggregated using the ELK stack (Elasticsearch, Logstash, Kibana) for centralized analysis and troubleshooting. Alerts are configured to notify administrators of critical events. See Monitoring Dashboard Access for details.

Component Version Configuration Details
Prometheus 2.37.2 Configured to scrape metrics from all servers
Grafana 8.5.1 Dashboards for CPU usage, memory usage, disk I/O, network traffic
Elasticsearch 7.17.6 Centralized log storage and indexing
Logstash 7.17.6 Log parsing and filtering
Kibana 7.17.6 Log visualization and analysis

Future Expansion

We anticipate expanding the cluster with additional GPU-powered servers for more demanding model training tasks. We are also evaluating the use of a high-performance storage system to improve data access speeds. Further details on the planned upgrades can be found on the Future Infrastructure Roadmap page. Consider also reviewing the Hardware Procurement Process if you plan to suggest new equipment.

AI Model Deployment Data Pipeline Architecture Kubernetes Configuration Files Troubleshooting Guide Security Incident Response Plan Network Troubleshooting Server Maintenance Schedule Disaster Recovery Plan Contact Information for Server Support Change Management Procedure Software Licensing Information Data Privacy Policy User Account Management Server Inventory


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️