AI in Durham

AI in Durham: Server Configuration

This document details the server configuration supporting the "AI in Durham" project. It is intended for new system administrators and developers contributing to the project's infrastructure. This project focuses on providing computational resources for local Artificial Intelligence research and development. It utilizes a hybrid cloud approach, leveraging both on-premise hardware and cloud services.

Overview

The "AI in Durham" project requires a robust and scalable server infrastructure to handle demanding workloads related to machine learning, deep learning, and data science. This infrastructure is designed for high throughput, low latency, and data security. We utilize a combination of GPU servers for training, CPU servers for inference and general processing, and a distributed file system for data storage. Cloud resources are used for burst capacity and specialized services. See Server Infrastructure Overview for a more general description of our server environments.

Hardware Specifications

The core on-premise infrastructure consists of three primary server types: GPU servers, CPU servers, and storage servers.

GPU Servers

These servers are dedicated to computationally intensive tasks like model training.

Specification	Value
Model	Dell PowerEdge R750xa
CPU	2 x Intel Xeon Gold 6348 (28 cores per CPU)
GPU	4 x NVIDIA A100 (80GB HBM2e)
RAM	512 GB DDR4 ECC REG
Storage	4 x 4TB NVMe PCIe Gen4 SSD (RAID 0)
Network	2 x 100GbE ConnectX-6
Operating System	Ubuntu 22.04 LTS

These servers run CUDA Toolkit 12.2 and cuDNN 8.9.2, optimized for deep learning frameworks like TensorFlow and PyTorch. See GPU Server Maintenance for information on monitoring and upkeep.

CPU Servers

These servers handle inference, data pre-processing, and general-purpose computing tasks.

Specification	Value
Model	Supermicro Super Server 2029U-TR4
CPU	2 x AMD EPYC 7763 (64 cores per CPU)
RAM	1TB DDR4 ECC REG
Storage	8 x 8TB SATA HDD (RAID 6) + 2 x 1TB NVMe SSD (OS)
Network	2 x 25GbE
Operating System	CentOS Stream 9

These servers utilize Docker containers for application isolation and portability. For detailed information on containerization practices, see Containerization Best Practices.

Storage Servers

These servers provide centralized storage for datasets and model artifacts.

Specification	Value
Model	NetApp FAS2750
Storage Capacity	368 TB Raw (Usable varies with RAID configuration)
RAID Level	RAID-6
Network	4 x 40GbE InfiniBand
File System	ONTAP

Storage is accessed via NFS and SMB protocols. Refer to the Data Storage Policy for details on data backup and recovery procedures.

Software Stack

The software stack is designed for flexibility and scalability. Key components include:

Operating Systems: Ubuntu 22.04 LTS (GPU Servers), CentOS Stream 9 (CPU Servers)
Containerization: Docker and Kubernetes
Machine Learning Frameworks: TensorFlow, PyTorch, Scikit-learn
Data Science Tools: Jupyter Notebook, RStudio
Monitoring: Prometheus, Grafana
Version Control: Git and GitHub
Configuration Management: Ansible
Networking: SSH, VPN

Network Configuration

The server infrastructure is connected via a dedicated 100GbE backbone network. A separate 1GbE network provides access for administrative tasks and general use. Firewall rules are configured to restrict access to essential services only. See Network Security Protocol for details. DNS is managed internally using BIND9.

Cloud Integration

We utilize Amazon Web Services (AWS) for burst capacity and specialized services such as:

AWS S3: For long-term data storage and archiving.
AWS EC2: For on-demand GPU instances during peak training periods.
AWS SageMaker: For managed machine learning services.

Communication between on-premise servers and AWS is secured via AWS VPN. For information on cloud cost management, see Cloud Cost Optimization.

Security Considerations

Security is paramount. All servers are regularly patched and monitored for vulnerabilities. Access control is enforced using strong authentication and authorization mechanisms. Data is encrypted both in transit and at rest. See Security Best Practices for comprehensive guidelines. The Incident Response Plan details procedures for handling security breaches.

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️