AI in Maidstone

From Server rental store
Jump to navigation Jump to search
  1. AI in Maidstone: Server Configuration Documentation

This document details the server configuration for the "AI in Maidstone" project, a research initiative focused on deploying and testing various artificial intelligence models for local government applications. This guide is designed for newcomers to the MediaWiki site and provides a comprehensive overview of the hardware, software, and networking components.

Overview

The "AI in Maidstone" project requires substantial computational resources for model training, inference, and data storage. The server infrastructure is designed for scalability and redundancy. We utilize a hybrid cloud approach, with core processing handled on-premises for data security and latency reasons, and leveraging cloud resources for burst capacity and specific services. This document covers the on-premises server configuration. Refer to Data Security Protocols for details on data handling.

Hardware Configuration

The primary server cluster consists of three identical nodes, designated as `ai-node-1`, `ai-node-2`, and `ai-node-3`. These nodes are housed within a secure, climate-controlled server room at the Maidstone Borough Council headquarters. A separate storage server, `ai-storage-1`, handles all persistent data.

Here's a detailed breakdown of the hardware for each node:

Component Specification Quantity per Node
CPU Intel Xeon Gold 6338 (32 cores, 64 threads) 1
RAM 256GB DDR4 ECC Registered 3200MHz 1
GPU NVIDIA A100 80GB PCIe 4.0 4
Primary Storage (OS) 512GB NVMe PCIe 4.0 SSD 1
Secondary Storage (Data) 4TB NVMe PCIe 4.0 SSD (RAID 0) 1
Network Interface 100GbE Mellanox ConnectX-6 2
Power Supply 1600W 80+ Platinum Redundant 2

The `ai-storage-1` server utilizes the following specifications:

Component Specification Quantity
CPU Intel Xeon Silver 4310 (12 cores, 24 threads) 1
RAM 128GB DDR4 ECC Registered 3200MHz 1
Storage 64TB SAS 12Gbps 7.2K RPM (RAID 6) 1 Array
Network Interface 40GbE Mellanox ConnectX-5 2
Power Supply 1200W 80+ Gold Redundant 2

For network infrastructure details, see Network Topology. Power redundancy is crucial; see Power Management Procedures.

Software Configuration

All nodes run Ubuntu Server 22.04 LTS. The primary software stack includes:

  • Operating System: Ubuntu Server 22.04 LTS (Kernel 5.15.0-76-generic)
  • Containerization: Docker 20.10.12
  • Container Orchestration: Kubernetes 1.24.0
  • GPU Drivers: NVIDIA Driver 525.85.12
  • CUDA Toolkit: CUDA 11.8
  • Machine Learning Frameworks: TensorFlow 2.12.0, PyTorch 1.13.1, scikit-learn 1.2.2
  • Programming Languages: Python 3.10, R 4.2.2
  • Monitoring: Prometheus and Grafana (see Monitoring Dashboard Configuration)

The storage server utilizes ZFS for data integrity and RAID-Z2 for redundancy. Access to the storage server is managed via NFS. Details on the NFS configuration are available in NFS Configuration Guide.

Networking Configuration

The server cluster is connected via a dedicated 100GbE network. The network is segmented into three VLANs:

VLAN ID Subnet Description
100 192.168.100.0/24 Management Network
200 192.168.200.0/24 Storage Network (NFS)
300 192.168.300.0/24 Compute Network (Kubernetes)

All nodes have static IP addresses assigned within their respective VLANs. A dedicated firewall, managed according to Firewall Rules and Policies, protects the server cluster. DNS resolution is handled internally via a BIND9 server. See DNS Server Configuration for details.

Security Considerations

Security is paramount. All servers are hardened according to CIS benchmarks. Regular security audits are conducted, as outlined in Security Audit Schedule. User access is strictly controlled via SSH keys and role-based access control (RBAC) within Kubernetes. All data is encrypted at rest and in transit. Refer to Encryption Standards for details.


Future Expansion

The architecture is designed for scalability. Additional compute nodes can be added to the Kubernetes cluster as needed. Storage capacity can be expanded by adding more drives to the ZFS array or by integrating cloud storage solutions. See Scalability Planning for the detailed roadmap.

Main Page Server Maintenance Troubleshooting Guide Contact Information Data Backup Procedures Disaster Recovery Plan Software Update Policy Hardware Inventory Change Management Process Incident Response Plan User Account Management Security Training Materials Compliance Documentation System Logs Analysis


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️