AI in Poole

From Server rental store
Jump to navigation Jump to search
  1. AI in Poole: Server Configuration

This document details the server configuration for the "AI in Poole" project, outlining hardware, software, and networking details. This is intended as a guide for new administrators and developers working with the system. This project utilizes a distributed computing model to support the intensive processing requirements of large language models. Please refer to Main Page for project overview.

Overview

The "AI in Poole" infrastructure consists of a cluster of servers located in a dedicated data center in Poole, UK. The primary function of these servers is to host and operate large language models, providing API access for various applications. The cluster is designed for high availability, scalability, and performance. See System Architecture for a diagram of the overall system. This project heavily leverages Docker containers for environment isolation and reproducibility. Regular backups are performed using Backup Procedures.

Hardware Specifications

The server cluster comprises three main types of nodes: Master Nodes, Compute Nodes, and Storage Nodes. Each node type has specific hardware requirements.

Node Type CPU RAM Storage Network Interface
2x Intel Xeon Gold 6338 | 128 GB DDR4 ECC | 2x 1 TB NVMe SSD (RAID 1) | 10 Gbps Ethernet |
2x AMD EPYC 7763 | 256 GB DDR4 ECC | 4x 4 TB NVMe SSD (RAID 0) | 100 Gbps InfiniBand |
2x Intel Xeon Silver 4310 | 64 GB DDR4 ECC | 8x 16 TB SATA HDD (RAID 6) | 10 Gbps Ethernet |

These specifications are subject to change as the project evolves. Refer to Hardware Inventory for the most up-to-date listing of individual server details. Power consumption is monitored via Power Monitoring System.

Software Stack

The "AI in Poole" servers run a customized Linux distribution based on Ubuntu 22.04 LTS. Key software components include:

  • Operating System: Ubuntu 22.04 LTS
  • Containerization: Docker 24.0.5 and Docker Compose
  • Orchestration: Kubernetes 1.27
  • Programming Languages: Python 3.10, CUDA (for GPU acceleration)
  • Machine Learning Frameworks: TensorFlow 2.12, PyTorch 2.0
  • Database: PostgreSQL 15 with TimescaleDB extension for time-series data. See Database Schema.
  • Monitoring: Prometheus and Grafana. Consult Monitoring Dashboard.
Software Component Version Purpose
24.0.5 | Containerization platform |
1.27 | Container orchestration |
2.12 | Machine learning framework |
2.0 | Machine learning framework |
15 | Database management system |
2.45 | Monitoring system |

All software is managed through automated configuration management using Ansible. See Ansible Playbooks for details.

Networking Configuration

The server cluster is connected to the internet via a redundant 10 Gbps fiber connection. Internal communication between nodes is primarily handled through a dedicated 100 Gbps InfiniBand network for low-latency, high-bandwidth data transfer. A separate 10 Gbps Ethernet network is used for storage access and management.

Interface IP Address Range Purpose
192.168.1.0/24 | Internet connectivity |
10.0.0.0/8 | Inter-node communication (Compute Nodes) |
172.16.0.0/16 | Storage access |

DNS resolution is handled by an internal BIND server. Firewall rules are managed using `iptables`. Further network details can be found in Network Diagram and Firewall Rules. Secure Shell (SSH) access is restricted to authorized personnel via key-based authentication. Please review Security Policies.


Security Considerations

Security is paramount. All servers are hardened according to CIS benchmarks. Regular security audits are conducted. Intrusion detection systems (IDS) are in place to monitor for malicious activity. Data encryption is used both in transit and at rest. Access control is strictly enforced using role-based access control (RBAC). See Security Documentation for comprehensive details.


Main Page System Architecture Hardware Inventory Software Versions Database Schema Monitoring Dashboard Backup Procedures Ansible Playbooks Network Diagram Firewall Rules Security Policies Security Documentation Troubleshooting Guide API Documentation Deployment Procedures Contact Information


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️