AI in Dover

AI in Dover: Server Configuration Documentation

Welcome to the documentation for the "AI in Dover" server configuration. This article provides a detailed overview of the hardware, software, and networking components that comprise this system, intended for newcomers to our MediaWiki site and those involved in system administration. This server cluster is dedicated to running advanced Artificial Intelligence workloads, specifically focusing on natural language processing and machine learning tasks. Understanding the configuration is crucial for effective maintenance, troubleshooting, and future expansion.

Overview

The "AI in Dover" project utilizes a distributed server cluster located in our Dover data center. The primary goal is to provide a robust and scalable platform for AI research and development. The cluster consists of several interconnected servers, each with specialized hardware for accelerated computing. The system is designed for high throughput and low latency, essential for demanding AI applications. This documentation details the specifications of the core components. See also Dover Data Center Standards for general facility information.

Hardware Specifications

The cluster is built around three primary node types: Master Nodes, Worker Nodes, and Storage Nodes. Each node type is configured with specific hardware to optimize its role.

Master Nodes

Master Nodes are responsible for job scheduling, resource management, and overall cluster coordination. Two Master Nodes are deployed for redundancy.

Specification	Value
CPU	Dual Intel Xeon Gold 6338
RAM	512 GB DDR4 ECC Registered
Storage (OS)	1 TB NVMe SSD
Network Interface	Dual 100 GbE
Power Supply	Redundant 1600W Platinum

These nodes run the cluster management software – currently Kubernetes – and provide a central point of control. Refer to Kubernetes Documentation for more details.

Worker Nodes

Worker Nodes perform the actual AI computations. These nodes are equipped with high-performance GPUs. We currently have 16 Worker Nodes.

Specification	Value
CPU	Dual AMD EPYC 7763
RAM	256 GB DDR4 ECC Registered
GPU	4 x NVIDIA A100 80GB
Storage (Local)	2 TB NVMe SSD (for temporary data)
Network Interface	Dual 100 GbE
Power Supply	Redundant 2000W Titanium

The GPUs provide the necessary processing power for training and inference tasks. See GPU Driver Installation Guide for driver details. These nodes are configured with Docker for containerization of AI workloads.

Storage Nodes

Storage Nodes provide persistent storage for datasets and model checkpoints. We currently use 4 Storage Nodes.

Specification	Value
CPU	Intel Xeon Silver 4310
RAM	128 GB DDR4 ECC Registered
Storage	16 x 18 TB SAS HDDs (RAID 6)
Network Interface	Dual 25 GbE
Power Supply	Redundant 1200W Gold

The storage is accessed via a Network File System (NFS) shared file system. Consult the NFS Configuration Guide for more information. Data backup procedures are detailed in Data Backup Policy.

Software Stack

The "AI in Dover" cluster utilizes a comprehensive software stack to facilitate AI development and deployment.

Operating System: Ubuntu Server 22.04 LTS is the base operating system for all nodes.
Containerization: Docker and Kubernetes are used for container orchestration and management.
Programming Languages: Python is the primary programming language, with support for R and Java.
AI Frameworks: TensorFlow, PyTorch, and scikit-learn are the supported AI frameworks.
Monitoring: Prometheus and Grafana are used for system monitoring and visualization. See Monitoring Dashboard Access for details.
Version Control: Git is used for version control. Code is hosted on our internal GitLab instance.

Networking Configuration

The cluster utilizes a dedicated VLAN for internal communication. All nodes are connected to a high-speed network switch.

Network Topology: Flat network with a single subnet.
IP Address Range: 192.168.10.0/24
DNS: Internal DNS server managed by the Network Team.
Firewall: iptables is used to secure the cluster. Refer to Firewall Rules Documentation for details.

Security Considerations

Security is paramount for the "AI in Dover" cluster. We adhere to the Data Security Policy.

Access Control: Access to the cluster is restricted to authorized personnel via SSH key authentication.
Data Encryption: Data is encrypted at rest and in transit.
Regular Security Audits: The cluster undergoes regular security audits conducted by the Security Team.

Future Expansion

Plans are underway to expand the cluster with additional Worker Nodes equipped with the latest generation of GPUs. This will significantly increase the cluster's computational capacity and enable us to tackle even more complex AI challenges. See Future Hardware Roadmap for planned upgrades.

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️