AI in Hastings
AI in Hastings: Server Configuration Documentation
This document details the server configuration for the "AI in Hastings" project, a local initiative utilizing artificial intelligence for community benefit. This guide is intended for newcomers to the server infrastructure and provides a technical overview of the hardware, software, and network setup. Understanding this information is crucial for maintenance, troubleshooting, and future expansion of the project. Please refer to the System Administration Policy before making any changes.
Overview
The "AI in Hastings" project relies on a distributed server cluster to process data and run machine learning models. The cluster is hosted within a secure data center located in Hastings, and is comprised of several specialized nodes. The core services include data ingestion, model training, and inference. This setup allows for rapid prototyping and deployment of AI solutions tailored to local needs, as outlined in the Project Goals document. For details on data privacy, please consult the Data Privacy Policy.
Hardware Configuration
The server cluster consists of three primary node types: Master Nodes, Worker Nodes, and Storage Nodes. Each node type is configured with specific hardware to optimize its role within the cluster.
Master Nodes
These nodes manage the cluster, schedule jobs, and monitor the overall health of the system. Two Master Nodes are deployed for redundancy.
Component | Specification |
---|---|
CPU | Dual Intel Xeon Gold 6248R (24 cores/48 threads per CPU) |
RAM | 256 GB DDR4 ECC Registered RAM |
Storage | 2 x 1TB NVMe SSD (RAID 1) for OS and Metadata |
Network Interface | Dual 10 Gigabit Ethernet |
Power Supply | Redundant 1200W Platinum Power Supplies |
Worker Nodes
These nodes perform the computationally intensive tasks of model training and inference. We currently have eight Worker Nodes. Refer to Scaling the Cluster for information on adding more.
Component | Specification |
---|---|
CPU | Dual AMD EPYC 7763 (64 cores/128 threads per CPU) |
RAM | 512 GB DDR4 ECC Registered RAM |
GPU | 4 x NVIDIA A100 (80GB HBM2e) |
Storage | 4 x 4TB NVMe SSD (RAID 0) for temporary data |
Network Interface | Dual 100 Gigabit Ethernet |
Storage Nodes
These nodes provide persistent storage for datasets and model artifacts. Three Storage Nodes are currently in service.
Component | Specification |
---|---|
CPU | Intel Xeon Silver 4210 (10 cores/20 threads) |
RAM | 64 GB DDR4 ECC Registered RAM |
Storage | 24 x 16TB SAS HDD (RAID 6) – Total 384TB usable storage |
Network Interface | Dual 25 Gigabit Ethernet |
File System | ZFS |
Software Configuration
The software stack is built around a Kubernetes cluster managed by Rancher. This provides orchestration and scalability. The operating system across all nodes is Ubuntu Server 22.04 LTS. See the Software Inventory for a complete list of installed packages.
- Kubernetes: Version 1.27.3 – Handles container orchestration and deployment.
- Rancher: Version 2.7.6 – Provides a user-friendly interface for managing the Kubernetes cluster.
- Docker: Version 20.10.12 – Containerization platform.
- Python: Version 3.10 – Used for machine learning tasks.
- TensorFlow: Version 2.12 – Machine learning framework.
- PyTorch: Version 2.0 – Alternative machine learning framework.
- Ceph: Used for distributed storage across the Storage Nodes. Configuration details are in the Ceph Configuration document.
- Prometheus & Grafana: Monitoring and visualization tools. Details in Monitoring Dashboard.
Network Configuration
The server cluster is isolated on a private network with the following key segments:
- Management Network: 192.168.10.0/24 – Used for accessing the Master Nodes and Rancher UI.
- Data Network: 10.0.0.0/16 – Used for communication between Worker and Storage Nodes.
- External Access: Limited access via a secure VPN for authorized personnel only. See VPN Access Guide.
All inter-node communication is encrypted using TLS. A firewall (iptables) is configured to restrict access to only necessary ports. The Network Diagram provides a visual representation of the network topology.
Security Considerations
Security is paramount. The following measures are in place:
- Regular security audits are conducted (see Security Audit Reports).
- All nodes are patched regularly with the latest security updates.
- Access to the server cluster is restricted to authorized personnel only.
- Data is encrypted both in transit and at rest.
- Intrusion detection and prevention systems are in place.
- Refer to the Incident Response Plan for handling security incidents.
Future Expansion
The "AI in Hastings" project is expected to grow. We are planning to add more Worker Nodes with even more powerful GPUs in the next phase. The current architecture supports scaling to a large number of nodes. See Future Scaling Plans for details.
Server Administration Troubleshooting Guide Data Backup Procedures Software Updates Project Documentation Hub
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️