Server rental store

AI in the Great Rift Valley

AI in the Great Rift Valley: Server Configuration

This document details the server configuration for the “AI in the Great Rift Valley” project, detailing the hardware, software, and network infrastructure supporting our artificial intelligence research and data processing initiatives. This guide is intended for new engineers and system administrators joining the project. Understanding this configuration is crucial for maintenance, troubleshooting, and future expansion.

Overview

The “AI in the Great Rift Valley” project utilizes a distributed server cluster located in a secure data center near Nairobi, Kenya. The primary goal of this infrastructure is to process and analyze large datasets collected from various sensors deployed throughout the Rift Valley, focusing on environmental monitoring, geological event prediction, and wildlife behavior analysis. The server cluster consists of compute nodes, storage nodes, and a dedicated network infrastructure to ensure high performance and data integrity. We employ a hybrid cloud approach, leveraging on-premises hardware for sensitive data and cloud resources for burst processing. This document focuses on the on-premises infrastructure.

Hardware Configuration

The core of our infrastructure comprises three primary types of servers: Compute Nodes, Storage Nodes, and a Management Node. Each server type is detailed below.

Compute Nodes

These nodes are responsible for the computationally intensive tasks of machine learning model training and inference. We currently utilize 12 compute nodes, each with the specifications outlined below.

Specification Value
CPU Dual Intel Xeon Gold 6338 (32 cores/64 threads per CPU)
RAM 256 GB DDR4 ECC Registered RAM
GPU 4 x NVIDIA A100 80GB GPUs
Storage (Local) 1 TB NVMe SSD (for temporary data and OS)
Network Interface Dual 100 GbE Network Interface Cards (NICs)
Power Supply Redundant 1600W Platinum Power Supplies

Storage Nodes

Storage nodes provide persistent storage for the raw data, processed data, and machine learning models. We have 6 storage nodes, configured for high availability and data redundancy.

Specification Value
CPU Intel Xeon Silver 4310 (12 cores/24 threads)
RAM 64 GB DDR4 ECC Registered RAM
Storage (Total) 6 x 16 TB SAS HDDs (RAID 6 configuration) – Total usable storage: ~80TB per node
Network Interface Dual 40 GbE Network Interface Cards (NICs)
RAID Controller Hardware RAID Controller with Battery Backup

Management Node

The management node is responsible for cluster monitoring, job scheduling, and overall system administration. It runs a lightweight operating system and focuses on control plane functions.

Specification Value
CPU Intel Xeon E-2324G (8 cores/16 threads)
RAM 32 GB DDR4 ECC Registered RAM
Storage 512 GB SATA SSD
Network Interface Dual 1 GbE Network Interface Cards (NICs)

Software Configuration

The server cluster runs a customized distribution of Ubuntu Server 22.04 LTS. Key software components include:

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️