AI in Devon
- AI in Devon: Server Configuration
This document details the server configuration supporting the "AI in Devon" project. It is intended as a technical guide for new system administrators and developers contributing to the project. We will cover hardware, software, networking, and security aspects. This project leverages a distributed computing model to accelerate machine learning tasks. This guide assumes familiarity with basic Linux system administration.
Overview
The "AI in Devon" project utilizes a cluster of servers located in a secure data center in Plymouth. These servers are dedicated to running computationally intensive machine learning algorithms, primarily focused on image recognition and natural language processing. The cluster is designed for scalability and redundancy. The primary goal is to provide a robust and reliable platform for AI research and development. Access to these servers is restricted to authorized personnel via SSH and a centralized authentication system.
Hardware Configuration
The core of the "AI in Devon" infrastructure consists of 12 high-performance servers. Each server is built using similar specifications to ensure consistency and ease of maintenance. A dedicated storage array provides centralized data access.
Server Component | Specification |
---|---|
CPU | Dual Intel Xeon Gold 6338 (32 cores per CPU) |
RAM | 256 GB DDR4 ECC Registered RAM |
Storage (Local) | 1 TB NVMe SSD (OS & Temporary Data) |
GPU | 4 x NVIDIA A100 (80GB HBM2e) |
Network Interface | Dual 100 Gbps Ethernet |
Power Supply | Redundant 1600W Platinum Power Supplies |
The storage array is a Dell EMC PowerScale F600, providing approximately 1 Petabyte of usable storage. The network infrastructure utilizes a Cisco Nexus 9508 switch for high-speed interconnectivity. We also have a dedicated backup server, utilizing Rsync and offsite storage for disaster recovery. Detailed hardware inventory is available on the Hardware Inventory Page.
Software Configuration
Each server runs a customized version of Ubuntu 22.04 LTS. The operating system is hardened according to CIS Benchmarks. The primary software stack consists of:
- CUDA 12.1: For GPU-accelerated computing.
- PyTorch 2.0: The primary machine learning framework.
- TensorFlow 2.12: Used for specific model deployments.
- Docker and Kubernetes: For containerization and orchestration.
- NFS: For shared file system access to the storage array.
- Prometheus and Grafana: For system monitoring and alerting.
Software Component | Version | Purpose |
---|---|---|
Ubuntu | 22.04 LTS | Operating System |
CUDA | 12.1 | GPU Computing Toolkit |
PyTorch | 2.0 | Machine Learning Framework |
TensorFlow | 2.12 | Machine Learning Framework |
Docker | 24.0.5 | Containerization |
Kubernetes | 1.27 | Container Orchestration |
All software is managed using a centralized package repository, built on APT. Regular security updates are applied automatically using Unattended Upgrades. Configuration management is handled via Ansible, ensuring consistency across the cluster. The version control system used for all code is Git, hosted on a private GitLab instance.
Networking Configuration
The server cluster is connected to the internal network via a dedicated VLAN. Each server has a static IP address within this VLAN. A firewall, configured using iptables, restricts access to the servers to authorized networks and ports. DNS resolution is handled by an internal BIND server. The network topology is illustrated on the Network Diagram.
Network Component | IP Address Range | Purpose |
---|---|---|
VLAN | 192.168.10.0/24 | Server Cluster |
Gateway | 192.168.10.1 | Internal Network Access |
DNS Server | 192.168.10.2 | DNS Resolution |
Firewall | 192.168.10.3 | Network Security |
All communication between servers is encrypted using TLS. External access to the cluster is limited to SSH and a secure web interface for monitoring. Regular network security audits are performed to identify and address potential vulnerabilities. We utilize VPN access for remote administration.
Security Considerations
Security is paramount for the "AI in Devon" project. The following security measures are in place:
- **Firewall:** A strict firewall policy restricts access to the servers.
- **Intrusion Detection System (IDS):** Snort is used to monitor network traffic for malicious activity.
- **Regular Security Audits:** Penetration testing is conducted quarterly.
- **Two-Factor Authentication (2FA):** Required for all SSH access and the web interface.
- **Data Encryption:** Sensitive data is encrypted at rest and in transit.
- **Least Privilege Principle:** Users are granted only the necessary permissions.
- **Vulnerability Scanning:** Servers are regularly scanned for vulnerabilities using Nessus.
Future Enhancements
Planned enhancements include:
- Upgrading the GPUs to the latest generation (NVIDIA H100).
- Implementing a more sophisticated monitoring system with predictive analytics.
- Expanding the storage capacity.
- Integrating a distributed file system such as Ceph.
Main Page Contributing Guidelines Contact Us FAQ Server Maintenance Schedule
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️