AI in Dover
AI in Dover: Server Configuration Documentation
Welcome to the documentation for the "AI in Dover" server configuration. This article provides a detailed overview of the hardware, software, and networking components that comprise this system, intended for newcomers to our MediaWiki site and those involved in system administration. This server cluster is dedicated to running advanced Artificial Intelligence workloads, specifically focusing on natural language processing and machine learning tasks. Understanding the configuration is crucial for effective maintenance, troubleshooting, and future expansion.
Overview
The "AI in Dover" project utilizes a distributed server cluster located in our Dover data center. The primary goal is to provide a robust and scalable platform for AI research and development. The cluster consists of several interconnected servers, each with specialized hardware for accelerated computing. The system is designed for high throughput and low latency, essential for demanding AI applications. This documentation details the specifications of the core components. See also Dover Data Center Standards for general facility information.
Hardware Specifications
The cluster is built around three primary node types: Master Nodes, Worker Nodes, and Storage Nodes. Each node type is configured with specific hardware to optimize its role.
Master Nodes
Master Nodes are responsible for job scheduling, resource management, and overall cluster coordination. Two Master Nodes are deployed for redundancy.
Specification | Value |
---|---|
CPU | Dual Intel Xeon Gold 6338 |
RAM | 512 GB DDR4 ECC Registered |
Storage (OS) | 1 TB NVMe SSD |
Network Interface | Dual 100 GbE |
Power Supply | Redundant 1600W Platinum |
These nodes run the cluster management software – currently Kubernetes – and provide a central point of control. Refer to Kubernetes Documentation for more details.
Worker Nodes
Worker Nodes perform the actual AI computations. These nodes are equipped with high-performance GPUs. We currently have 16 Worker Nodes.
Specification | Value |
---|---|
CPU | Dual AMD EPYC 7763 |
RAM | 256 GB DDR4 ECC Registered |
GPU | 4 x NVIDIA A100 80GB |
Storage (Local) | 2 TB NVMe SSD (for temporary data) |
Network Interface | Dual 100 GbE |
Power Supply | Redundant 2000W Titanium |
The GPUs provide the necessary processing power for training and inference tasks. See GPU Driver Installation Guide for driver details. These nodes are configured with Docker for containerization of AI workloads.
Storage Nodes
Storage Nodes provide persistent storage for datasets and model checkpoints. We currently use 4 Storage Nodes.
Specification | Value |
---|---|
CPU | Intel Xeon Silver 4310 |
RAM | 128 GB DDR4 ECC Registered |
Storage | 16 x 18 TB SAS HDDs (RAID 6) |
Network Interface | Dual 25 GbE |
Power Supply | Redundant 1200W Gold |
The storage is accessed via a Network File System (NFS) shared file system. Consult the NFS Configuration Guide for more information. Data backup procedures are detailed in Data Backup Policy.
Software Stack
The "AI in Dover" cluster utilizes a comprehensive software stack to facilitate AI development and deployment.
- Operating System: Ubuntu Server 22.04 LTS is the base operating system for all nodes.
- Containerization: Docker and Kubernetes are used for container orchestration and management.
- Programming Languages: Python is the primary programming language, with support for R and Java.
- AI Frameworks: TensorFlow, PyTorch, and scikit-learn are the supported AI frameworks.
- Monitoring: Prometheus and Grafana are used for system monitoring and visualization. See Monitoring Dashboard Access for details.
- Version Control: Git is used for version control. Code is hosted on our internal GitLab instance.
Networking Configuration
The cluster utilizes a dedicated VLAN for internal communication. All nodes are connected to a high-speed network switch.
- Network Topology: Flat network with a single subnet.
- IP Address Range: 192.168.10.0/24
- DNS: Internal DNS server managed by the Network Team.
- Firewall: iptables is used to secure the cluster. Refer to Firewall Rules Documentation for details.
Security Considerations
Security is paramount for the "AI in Dover" cluster. We adhere to the Data Security Policy.
- Access Control: Access to the cluster is restricted to authorized personnel via SSH key authentication.
- Data Encryption: Data is encrypted at rest and in transit.
- Regular Security Audits: The cluster undergoes regular security audits conducted by the Security Team.
Future Expansion
Plans are underway to expand the cluster with additional Worker Nodes equipped with the latest generation of GPUs. This will significantly increase the cluster's computational capacity and enable us to tackle even more complex AI challenges. See Future Hardware Roadmap for planned upgrades.
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️