AI in the Great Rift Valley

From Server rental store
Jump to navigation Jump to search

AI in the Great Rift Valley: Server Configuration

This document details the server configuration for the “AI in the Great Rift Valley” project, detailing the hardware, software, and network infrastructure supporting our artificial intelligence research and data processing initiatives. This guide is intended for new engineers and system administrators joining the project. Understanding this configuration is crucial for maintenance, troubleshooting, and future expansion.

Overview

The “AI in the Great Rift Valley” project utilizes a distributed server cluster located in a secure data center near Nairobi, Kenya. The primary goal of this infrastructure is to process and analyze large datasets collected from various sensors deployed throughout the Rift Valley, focusing on environmental monitoring, geological event prediction, and wildlife behavior analysis. The server cluster consists of compute nodes, storage nodes, and a dedicated network infrastructure to ensure high performance and data integrity. We employ a hybrid cloud approach, leveraging on-premises hardware for sensitive data and cloud resources for burst processing. This document focuses on the on-premises infrastructure.

Hardware Configuration

The core of our infrastructure comprises three primary types of servers: Compute Nodes, Storage Nodes, and a Management Node. Each server type is detailed below.

Compute Nodes

These nodes are responsible for the computationally intensive tasks of machine learning model training and inference. We currently utilize 12 compute nodes, each with the specifications outlined below.

Specification Value
CPU Dual Intel Xeon Gold 6338 (32 cores/64 threads per CPU)
RAM 256 GB DDR4 ECC Registered RAM
GPU 4 x NVIDIA A100 80GB GPUs
Storage (Local) 1 TB NVMe SSD (for temporary data and OS)
Network Interface Dual 100 GbE Network Interface Cards (NICs)
Power Supply Redundant 1600W Platinum Power Supplies

Storage Nodes

Storage nodes provide persistent storage for the raw data, processed data, and machine learning models. We have 6 storage nodes, configured for high availability and data redundancy.

Specification Value
CPU Intel Xeon Silver 4310 (12 cores/24 threads)
RAM 64 GB DDR4 ECC Registered RAM
Storage (Total) 6 x 16 TB SAS HDDs (RAID 6 configuration) – Total usable storage: ~80TB per node
Network Interface Dual 40 GbE Network Interface Cards (NICs)
RAID Controller Hardware RAID Controller with Battery Backup

Management Node

The management node is responsible for cluster monitoring, job scheduling, and overall system administration. It runs a lightweight operating system and focuses on control plane functions.

Specification Value
CPU Intel Xeon E-2324G (8 cores/16 threads)
RAM 32 GB DDR4 ECC Registered RAM
Storage 512 GB SATA SSD
Network Interface Dual 1 GbE Network Interface Cards (NICs)

Software Configuration

The server cluster runs a customized distribution of Ubuntu Server 22.04 LTS. Key software components include:

Network Infrastructure

The server cluster is connected via a dedicated high-speed network. Key components include:

  • Network Topology: Spine-Leaf architecture for low latency and high bandwidth.
  • Switches: Arista 7050X Series switches. See the Network Diagram for a detailed overview.
  • Interconnect: 100 GbE and 40 GbE connections between servers and switches.
  • Firewall: pfSense firewall protects the cluster from external threats. Refer to the Firewall Configuration documentation.
  • DNS: Internal BIND9 DNS server for name resolution within the cluster.


Security Considerations

Security is paramount. The following measures are in place:

  • Physical Security: Data center access is restricted and monitored 24/7.
  • Network Security: Firewall rules are strictly enforced, and network traffic is monitored.
  • Data Encryption: Data at rest and in transit is encrypted using industry-standard encryption algorithms.
  • Access Control: Role-Based Access Control (RBAC) is implemented to limit access to sensitive data and resources. See the Access Control Policy for more information.
  • Regular Security Audits: Periodic security audits are conducted to identify and address vulnerabilities.


Future Expansion

We anticipate expanding the cluster in the coming months to accommodate growing data volumes and increasingly complex machine learning models. Planned upgrades include:

  • Adding more compute nodes with newer generation GPUs.
  • Increasing the storage capacity of the storage nodes.
  • Implementing a disaster recovery site to ensure business continuity.
  • Exploring the use of additional cloud resources for burst processing.


Related Documentation


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️