AI in Hastings

From Server rental store
Revision as of 06:07, 16 April 2025 by Admin (talk | contribs) (Automated server configuration article)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

AI in Hastings: Server Configuration Documentation

This document details the server configuration for the "AI in Hastings" project, a local initiative utilizing artificial intelligence for community benefit. This guide is intended for newcomers to the server infrastructure and provides a technical overview of the hardware, software, and network setup. Understanding this information is crucial for maintenance, troubleshooting, and future expansion of the project. Please refer to the System Administration Policy before making any changes.

Overview

The "AI in Hastings" project relies on a distributed server cluster to process data and run machine learning models. The cluster is hosted within a secure data center located in Hastings, and is comprised of several specialized nodes. The core services include data ingestion, model training, and inference. This setup allows for rapid prototyping and deployment of AI solutions tailored to local needs, as outlined in the Project Goals document. For details on data privacy, please consult the Data Privacy Policy.

Hardware Configuration

The server cluster consists of three primary node types: Master Nodes, Worker Nodes, and Storage Nodes. Each node type is configured with specific hardware to optimize its role within the cluster.

Master Nodes

These nodes manage the cluster, schedule jobs, and monitor the overall health of the system. Two Master Nodes are deployed for redundancy.

Component Specification
CPU Dual Intel Xeon Gold 6248R (24 cores/48 threads per CPU)
RAM 256 GB DDR4 ECC Registered RAM
Storage 2 x 1TB NVMe SSD (RAID 1) for OS and Metadata
Network Interface Dual 10 Gigabit Ethernet
Power Supply Redundant 1200W Platinum Power Supplies

Worker Nodes

These nodes perform the computationally intensive tasks of model training and inference. We currently have eight Worker Nodes. Refer to Scaling the Cluster for information on adding more.

Component Specification
CPU Dual AMD EPYC 7763 (64 cores/128 threads per CPU)
RAM 512 GB DDR4 ECC Registered RAM
GPU 4 x NVIDIA A100 (80GB HBM2e)
Storage 4 x 4TB NVMe SSD (RAID 0) for temporary data
Network Interface Dual 100 Gigabit Ethernet

Storage Nodes

These nodes provide persistent storage for datasets and model artifacts. Three Storage Nodes are currently in service.

Component Specification
CPU Intel Xeon Silver 4210 (10 cores/20 threads)
RAM 64 GB DDR4 ECC Registered RAM
Storage 24 x 16TB SAS HDD (RAID 6) – Total 384TB usable storage
Network Interface Dual 25 Gigabit Ethernet
File System ZFS

Software Configuration

The software stack is built around a Kubernetes cluster managed by Rancher. This provides orchestration and scalability. The operating system across all nodes is Ubuntu Server 22.04 LTS. See the Software Inventory for a complete list of installed packages.

  • Kubernetes: Version 1.27.3 – Handles container orchestration and deployment.
  • Rancher: Version 2.7.6 – Provides a user-friendly interface for managing the Kubernetes cluster.
  • Docker: Version 20.10.12 – Containerization platform.
  • Python: Version 3.10 – Used for machine learning tasks.
  • TensorFlow: Version 2.12 – Machine learning framework.
  • PyTorch: Version 2.0 – Alternative machine learning framework.
  • Ceph: Used for distributed storage across the Storage Nodes. Configuration details are in the Ceph Configuration document.
  • Prometheus & Grafana: Monitoring and visualization tools. Details in Monitoring Dashboard.

Network Configuration

The server cluster is isolated on a private network with the following key segments:

  • Management Network: 192.168.10.0/24 – Used for accessing the Master Nodes and Rancher UI.
  • Data Network: 10.0.0.0/16 – Used for communication between Worker and Storage Nodes.
  • External Access: Limited access via a secure VPN for authorized personnel only. See VPN Access Guide.

All inter-node communication is encrypted using TLS. A firewall (iptables) is configured to restrict access to only necessary ports. The Network Diagram provides a visual representation of the network topology.

Security Considerations

Security is paramount. The following measures are in place:

  • Regular security audits are conducted (see Security Audit Reports).
  • All nodes are patched regularly with the latest security updates.
  • Access to the server cluster is restricted to authorized personnel only.
  • Data is encrypted both in transit and at rest.
  • Intrusion detection and prevention systems are in place.
  • Refer to the Incident Response Plan for handling security incidents.

Future Expansion

The "AI in Hastings" project is expected to grow. We are planning to add more Worker Nodes with even more powerful GPUs in the next phase. The current architecture supports scaling to a large number of nodes. See Future Scaling Plans for details.

Server Administration Troubleshooting Guide Data Backup Procedures Software Updates Project Documentation Hub


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️