Server rental store

AI in Hastings

AI in Hastings: Server Configuration Documentation

This document details the server configuration for the "AI in Hastings" project, a local initiative utilizing artificial intelligence for community benefit. This guide is intended for newcomers to the server infrastructure and provides a technical overview of the hardware, software, and network setup. Understanding this information is crucial for maintenance, troubleshooting, and future expansion of the project. Please refer to the System Administration Policy before making any changes.

Overview

The "AI in Hastings" project relies on a distributed server cluster to process data and run machine learning models. The cluster is hosted within a secure data center located in Hastings, and is comprised of several specialized nodes. The core services include data ingestion, model training, and inference. This setup allows for rapid prototyping and deployment of AI solutions tailored to local needs, as outlined in the Project Goals document. For details on data privacy, please consult the Data Privacy Policy.

Hardware Configuration

The server cluster consists of three primary node types: Master Nodes, Worker Nodes, and Storage Nodes. Each node type is configured with specific hardware to optimize its role within the cluster.

Master Nodes

These nodes manage the cluster, schedule jobs, and monitor the overall health of the system. Two Master Nodes are deployed for redundancy.

Component Specification
CPU Dual Intel Xeon Gold 6248R (24 cores/48 threads per CPU)
RAM 256 GB DDR4 ECC Registered RAM
Storage 2 x 1TB NVMe SSD (RAID 1) for OS and Metadata
Network Interface Dual 10 Gigabit Ethernet
Power Supply Redundant 1200W Platinum Power Supplies

Worker Nodes

These nodes perform the computationally intensive tasks of model training and inference. We currently have eight Worker Nodes. Refer to Scaling the Cluster for information on adding more.

Component Specification
CPU Dual AMD EPYC 7763 (64 cores/128 threads per CPU)
RAM 512 GB DDR4 ECC Registered RAM
GPU 4 x NVIDIA A100 (80GB HBM2e)
Storage 4 x 4TB NVMe SSD (RAID 0) for temporary data
Network Interface Dual 100 Gigabit Ethernet

Storage Nodes

These nodes provide persistent storage for datasets and model artifacts. Three Storage Nodes are currently in service.

Component Specification
CPU Intel Xeon Silver 4210 (10 cores/20 threads)
RAM 64 GB DDR4 ECC Registered RAM
Storage 24 x 16TB SAS HDD (RAID 6) – Total 384TB usable storage
Network Interface Dual 25 Gigabit Ethernet
File System ZFS

Software Configuration

The software stack is built around a Kubernetes cluster managed by Rancher. This provides orchestration and scalability. The operating system across all nodes is Ubuntu Server 22.04 LTS. See the Software Inventory for a complete list of installed packages.

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️