AI in Ramsgate
AI in Ramsgate: Server Configuration Documentation
Welcome to the documentation for the "AI in Ramsgate" server configuration. This article details the hardware and software setup supporting our artificial intelligence initiatives at the Ramsgate facility. This documentation is intended for new system administrators and developers working with the AI infrastructure. Please familiarize yourself with this information before making any changes to the system. This setup is crucial for the smooth operation of our Machine Learning projects.
Overview
The "AI in Ramsgate" cluster is designed for high-performance computing, specifically tailored for training and inference of large Deep Learning models. It consists of a network of interconnected servers, storage arrays, and networking equipment. The primary goal of this configuration is to provide a scalable and reliable platform for Artificial Intelligence research and deployment. We utilize a hybrid approach, combining on-premise hardware with cloud-based resources via API integration for peak demand. All access is managed through our central Authentication System.
Hardware Specifications
The core of the AI cluster comprises eight dedicated server nodes. Below is a detailed breakdown of the hardware specifications for each node:
Node ID | CPU | RAM | GPU | Storage |
---|---|---|---|---|
Node-01 | 2 x Intel Xeon Gold 6338 | 512 GB DDR4 ECC | 4 x NVIDIA A100 (80GB) | 4 x 8TB NVMe SSD (RAID 0) |
Node-02 | 2 x Intel Xeon Gold 6338 | 512 GB DDR4 ECC | 4 x NVIDIA A100 (80GB) | 4 x 8TB NVMe SSD (RAID 0) |
Node-03 | 2 x Intel Xeon Gold 6338 | 512 GB DDR4 ECC | 4 x NVIDIA A100 (80GB) | 4 x 8TB NVMe SSD (RAID 0) |
Node-04 | 2 x Intel Xeon Gold 6338 | 512 GB DDR4 ECC | 4 x NVIDIA A100 (80GB) | 4 x 8TB NVMe SSD (RAID 0) |
Node-05 | 2 x AMD EPYC 7763 | 1TB DDR4 ECC | 8 x NVIDIA A100 (40GB) | 8 x 8TB NVMe SSD (RAID 0) |
Node-06 | 2 x AMD EPYC 7763 | 1TB DDR4 ECC | 8 x NVIDIA A100 (40GB) | 8 x 8TB NVMe SSD (RAID 0) |
Node-07 | 2 x AMD EPYC 7763 | 1TB DDR4 ECC | 8 x NVIDIA A100 (40GB) | 8 x 8TB NVMe SSD (RAID 0) |
Node-08 | 2 x AMD EPYC 7763 | 1TB DDR4 ECC | 8 x NVIDIA A100 (40GB) | 8 x 8TB NVMe SSD (RAID 0) |
These nodes are interconnected via a 100Gbps InfiniBand network, ensuring high-speed communication for distributed training. A dedicated Network Monitoring system constantly monitors network performance.
Software Stack
The software stack is built around Ubuntu Server 22.04 LTS. We utilize containerization with Docker and orchestration with Kubernetes to manage application deployments. The following table outlines key software components:
Software Component | Version | Purpose |
---|---|---|
Ubuntu Server | 22.04 LTS | Operating System |
NVIDIA CUDA Toolkit | 12.1 | GPU Programming |
cuDNN | 8.9.2 | Deep Neural Network Library |
Docker | 24.0.7 | Containerization |
Kubernetes | 1.28 | Container Orchestration |
Python | 3.10 | Primary Programming Language |
TensorFlow | 2.13 | Deep Learning Framework |
PyTorch | 2.0 | Deep Learning Framework |
All code is version controlled using Git and hosted on our internal Git Repository. We follow a strict Continuous Integration/Continuous Deployment (CI/CD) pipeline.
Storage Configuration
The cluster utilizes a combination of local NVMe SSDs for fast data access during training and a centralized network file system (NFS) for persistent storage. The NFS server is a dedicated machine with the following specifications:
Component | Specification |
---|---|
Server Model | Dell PowerEdge R750 |
CPU | 2 x Intel Xeon Silver 4310 |
RAM | 256 GB DDR4 ECC |
Storage | 10 x 16TB SAS HDD (RAID 6) |
Network Interface | 40Gbps Ethernet |
The NFS share is mounted on all compute nodes at `/mnt/nfs`. Data backups are performed nightly using a dedicated Backup System. Any data stored on the local NVMe drives is considered ephemeral and is not backed up. Data pipelines are managed with Apache Airflow.
Security Considerations
Security is paramount. Access to the cluster is restricted to authorized personnel only. We employ a multi-factor authentication system and regularly audit system logs. All network traffic is monitored for suspicious activity using an Intrusion Detection System. Regular Vulnerability Scanning is performed to identify and remediate potential security vulnerabilities.
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️