AI in Ramsgate

From Server rental store
Jump to navigation Jump to search

AI in Ramsgate: Server Configuration Documentation

Welcome to the documentation for the "AI in Ramsgate" server configuration. This article details the hardware and software setup supporting our artificial intelligence initiatives at the Ramsgate facility. This documentation is intended for new system administrators and developers working with the AI infrastructure. Please familiarize yourself with this information before making any changes to the system. This setup is crucial for the smooth operation of our Machine Learning projects.

Overview

The "AI in Ramsgate" cluster is designed for high-performance computing, specifically tailored for training and inference of large Deep Learning models. It consists of a network of interconnected servers, storage arrays, and networking equipment. The primary goal of this configuration is to provide a scalable and reliable platform for Artificial Intelligence research and deployment. We utilize a hybrid approach, combining on-premise hardware with cloud-based resources via API integration for peak demand. All access is managed through our central Authentication System.

Hardware Specifications

The core of the AI cluster comprises eight dedicated server nodes. Below is a detailed breakdown of the hardware specifications for each node:

Node ID CPU RAM GPU Storage
Node-01 2 x Intel Xeon Gold 6338 512 GB DDR4 ECC 4 x NVIDIA A100 (80GB) 4 x 8TB NVMe SSD (RAID 0)
Node-02 2 x Intel Xeon Gold 6338 512 GB DDR4 ECC 4 x NVIDIA A100 (80GB) 4 x 8TB NVMe SSD (RAID 0)
Node-03 2 x Intel Xeon Gold 6338 512 GB DDR4 ECC 4 x NVIDIA A100 (80GB) 4 x 8TB NVMe SSD (RAID 0)
Node-04 2 x Intel Xeon Gold 6338 512 GB DDR4 ECC 4 x NVIDIA A100 (80GB) 4 x 8TB NVMe SSD (RAID 0)
Node-05 2 x AMD EPYC 7763 1TB DDR4 ECC 8 x NVIDIA A100 (40GB) 8 x 8TB NVMe SSD (RAID 0)
Node-06 2 x AMD EPYC 7763 1TB DDR4 ECC 8 x NVIDIA A100 (40GB) 8 x 8TB NVMe SSD (RAID 0)
Node-07 2 x AMD EPYC 7763 1TB DDR4 ECC 8 x NVIDIA A100 (40GB) 8 x 8TB NVMe SSD (RAID 0)
Node-08 2 x AMD EPYC 7763 1TB DDR4 ECC 8 x NVIDIA A100 (40GB) 8 x 8TB NVMe SSD (RAID 0)

These nodes are interconnected via a 100Gbps InfiniBand network, ensuring high-speed communication for distributed training. A dedicated Network Monitoring system constantly monitors network performance.

Software Stack

The software stack is built around Ubuntu Server 22.04 LTS. We utilize containerization with Docker and orchestration with Kubernetes to manage application deployments. The following table outlines key software components:

Software Component Version Purpose
Ubuntu Server 22.04 LTS Operating System
NVIDIA CUDA Toolkit 12.1 GPU Programming
cuDNN 8.9.2 Deep Neural Network Library
Docker 24.0.7 Containerization
Kubernetes 1.28 Container Orchestration
Python 3.10 Primary Programming Language
TensorFlow 2.13 Deep Learning Framework
PyTorch 2.0 Deep Learning Framework

All code is version controlled using Git and hosted on our internal Git Repository. We follow a strict Continuous Integration/Continuous Deployment (CI/CD) pipeline.

Storage Configuration

The cluster utilizes a combination of local NVMe SSDs for fast data access during training and a centralized network file system (NFS) for persistent storage. The NFS server is a dedicated machine with the following specifications:

Component Specification
Server Model Dell PowerEdge R750
CPU 2 x Intel Xeon Silver 4310
RAM 256 GB DDR4 ECC
Storage 10 x 16TB SAS HDD (RAID 6)
Network Interface 40Gbps Ethernet

The NFS share is mounted on all compute nodes at `/mnt/nfs`. Data backups are performed nightly using a dedicated Backup System. Any data stored on the local NVMe drives is considered ephemeral and is not backed up. Data pipelines are managed with Apache Airflow.


Security Considerations

Security is paramount. Access to the cluster is restricted to authorized personnel only. We employ a multi-factor authentication system and regularly audit system logs. All network traffic is monitored for suspicious activity using an Intrusion Detection System. Regular Vulnerability Scanning is performed to identify and remediate potential security vulnerabilities.


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️