AI in Kent
- AI in Kent: Server Configuration
This document details the server configuration for the "AI in Kent" project, providing a technical overview for system administrators, developers, and anyone contributing to the platform. It covers hardware specifications, software stack, networking, and security considerations. This guide assumes a basic understanding of Linux server administration and MediaWiki concepts.
Overview
The "AI in Kent" project leverages a cluster of servers to provide the computational resources needed for training and deploying artificial intelligence models, specifically focusing on natural language processing and computer vision. The servers are located in a dedicated data center within Kent, and are connected via a high-speed network. This documentation outlines the key components and configurations. See Special:MyUserPage for contact information for the team maintaining this infrastructure.
Hardware Specifications
The server cluster comprises three primary types of nodes: Master Nodes, Worker Nodes, and Storage Nodes. Each node type has specific hardware requirements to optimize performance and reliability.
Node Type | CPU | RAM | Storage | Network Interface |
---|---|---|---|---|
Master Nodes (2) | 2 x Intel Xeon Gold 6248R (24 cores each) | 256 GB DDR4 ECC REG | 2 x 1TB NVMe SSD (RAID 1) | 100 Gbps Ethernet |
Worker Nodes (8) | 2 x AMD EPYC 7763 (64 cores each) | 512 GB DDR4 ECC REG | 4 x 4TB SAS HDD (RAID 10) + 1 x 1TB NVMe SSD (local scratch) | 100 Gbps Ethernet |
Storage Nodes (3) | 2 x Intel Xeon Silver 4210 (10 cores each) | 128 GB DDR4 ECC REG | 16 x 16TB SAS HDD (RAID 6) | 40 Gbps Ethernet |
These specifications were chosen to balance cost-effectiveness with the demanding requirements of AI workloads. The use of NVMe SSDs for the Master Nodes ensures fast boot times and responsiveness, while the large RAM capacity on the Worker Nodes allows for handling large datasets. Server Room Access is restricted to authorized personnel only.
Software Stack
The software stack is built on a foundation of Ubuntu Server 22.04 LTS.
Component | Version | Purpose |
---|---|---|
Operating System | Ubuntu Server 22.04 LTS | Provides the base operating system environment. |
Kubernetes | v1.27 | Container orchestration platform for managing and scaling AI workloads. See Kubernetes Documentation. |
Docker | 20.10.21 | Containerization technology for packaging and deploying AI models. |
NVIDIA CUDA Toolkit | 11.8 | Provides the tools and libraries for GPU-accelerated computing. GPU Driver Updates are critical. |
TensorFlow | 2.12 | Machine learning framework. |
PyTorch | 2.0 | Machine learning framework. |
Ceph | Pacific | Distributed storage system for managing large datasets. |
All software packages are managed using `apt` and regularly updated to ensure security and stability. We utilize Ansible for automated configuration management.
Networking Configuration
The server cluster is connected via a dedicated VLAN with a /24 subnet. Each node is assigned a static IP address. The network topology is a spine-leaf architecture, providing high bandwidth and low latency.
Network Segment | Subnet | Gateway | DNS Server |
---|---|---|---|
Management Network | 192.168.1.0/24 | 192.168.1.1 | 8.8.8.8, 8.8.4.4 |
Cluster Network | 10.0.0.0/24 | 10.0.0.1 | 10.0.0.1 |
Storage Network | 172.16.0.0/24 | 172.16.0.1 | 172.16.0.1 |
Firewall rules are configured using `ufw` to restrict access to only necessary ports. See Network Diagram for detailed visualization. The use of VPN Access is required for remote administration.
Security Considerations
Security is a paramount concern. The following measures are in place:
- **Regular Security Audits:** Performed quarterly by an external security firm.
- **Intrusion Detection System (IDS):** `Snort` is deployed to monitor network traffic for malicious activity.
- **Firewall:** `ufw` is configured to restrict access to essential ports.
- **Access Control:** Strict access control policies are enforced using SSH keys and user authentication.
- **Data Encryption:** Data at rest and in transit is encrypted using TLS/SSL.
- **Vulnerability Scanning:** Regularly performed using `Nessus`. See Security Incident Response Plan.
- **Patch Management:** Automatic security updates are enabled.
Monitoring and Logging
The system is monitored using Prometheus and Grafana. Logs are collected and analyzed using the ELK stack (Elasticsearch, Logstash, Kibana). Alerts are configured to notify administrators of critical events. Monitoring Dashboard Link provides access to the Grafana dashboard. Review Log Rotation Policy for details on log management.
Special:Search/Kubernetes Special:Search/TensorFlow Special:Search/Ubuntu Server Special:Search/NVIDIA CUDA Special:Search/Ceph Special:Search/GPU Special:Search/Data Center Special:Search/Security Audits Special:Search/Network Configuration Special:Search/Firewall Special:Search/Storage Nodes Special:Search/Master Nodes Special:Search/Worker Nodes Special:Search/Ansible Special:Search/Prometheus
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️