AI in Fiji
AI in Fiji: Server Configuration and Deployment
Welcome to the guide on setting up servers for Artificial Intelligence (AI) workloads within the Fiji data center. This document details the hardware and software configurations required for a robust and scalable AI infrastructure. This is intended for newcomers to our server environment and assumes basic familiarity with Linux system administration.
Overview
The "AI in Fiji" project aims to provide a platform for researchers and developers to experiment with and deploy AI models. This requires specialized hardware, particularly GPUs, and a carefully configured software stack. This document covers the core server components, network configuration, and software prerequisites. We will focus on a base configuration suitable for both training and inference tasks. See Server_Security_Protocols for important security considerations. Refer to Data_Center_Cooling for information regarding thermal management.
Hardware Specifications
The foundation of our AI infrastructure relies on high-performance servers. The following table details the specifications for the primary AI server nodes:
Component | Specification | Quantity per Server |
---|---|---|
CPU | Intel Xeon Gold 6338 (32 Cores) | 2 |
RAM | 256 GB DDR4 ECC REG | 1 |
GPU | NVIDIA A100 80GB | 4 |
Storage (OS) | 500GB NVMe SSD | 1 |
Storage (Data) | 8TB SAS HDD (RAID 5) | 1 |
Network Interface | 100Gbps Ethernet | 2 |
Power Supply | 2000W Redundant | 2 |
These servers are interconnected via a high-bandwidth, low-latency network. See Network_Topology_Diagram for a visual representation. Understanding Power_Distribution_Units is crucial for efficient power management.
Network Configuration
The network is designed to facilitate rapid data transfer between servers and external storage. The following table outlines the key network parameters:
Parameter | Value |
---|---|
Network Type | InfiniBand & Ethernet |
IP Address Range | 192.168.10.0/24 (Internal AI Network) |
Gateway | 192.168.10.1 |
DNS Servers | 8.8.8.8, 8.8.4.4 |
Subnet Mask | 255.255.255.0 |
VLAN ID | 100 (AI Network) |
Servers utilize both 100Gbps Ethernet for general communication and InfiniBand for inter-GPU communication during distributed training. Refer to Firewall_Configuration for network security rules. Proper DNS_Record_Management is essential for service discovery.
Software Stack
The software stack is built around Ubuntu 20.04 LTS, providing a stable and well-supported base. The following table details the core software components:
Software | Version | Purpose |
---|---|---|
Operating System | Ubuntu 20.04 LTS | Base Operating System |
NVIDIA Drivers | 535.104.05 | GPU Driver |
CUDA Toolkit | 12.2 | Parallel Computing Platform |
cuDNN | 8.9.2 | Deep Neural Network Library |
Docker | 24.0.5 | Containerization Platform |
Kubernetes | 1.27.4 | Container Orchestration |
Python | 3.9 | Programming Language |
TensorFlow | 2.13.0 | Deep Learning Framework |
PyTorch | 2.0.1 | Deep Learning Framework |
All AI workloads are containerized using Docker and orchestrated with Kubernetes to ensure scalability and portability. See Docker_Best_Practices for guidance on containerizing applications. Familiarize yourself with Kubernetes_Deployment_Strategies for effective cluster management. We utilize Monitoring_and_Alerting_Systems to track performance. Understanding Log_Management is vital for debugging.
Important Considerations
- GPU Management: Utilize `nvidia-smi` for monitoring GPU utilization and health.
- Storage Access: Access to the shared data storage is provided via NFS. See NFS_Configuration for details.
- Security: Adhere to the Server_Security_Protocols to ensure data integrity and confidentiality.
- Backup and Recovery: Regular backups are performed. Refer to Backup_and_Disaster_Recovery for details.
- Scaling: The Kubernetes cluster is designed to scale horizontally. Refer to Horizontal_Pod_Autoscaling documentation.
Main_Page Server_Documentation Contact_Support Troubleshooting_Guide Frequently_Asked_Questions
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️