Server rental store

AI in the Gobi Desert

# AI in the Gobi Desert: Server Configuration

This article details the server configuration utilized for the "AI in the Gobi Desert" project, a research initiative focused on deploying and maintaining artificial intelligence models in an extremely challenging environmental setting. This guide is intended for new members of the operations team and provides a comprehensive overview of the hardware and software choices. It assumes a basic understanding of Linux server administration and networking concepts.

Project Overview

The "AI in the Gobi Desert" project aims to leverage machine learning for environmental monitoring, specifically focusing on desertification patterns and wildlife tracking. The servers are deployed in a remote location, presenting unique challenges related to power availability, cooling, and network connectivity. The core requirements are high computational power for model training and inference, data storage for large datasets, and robust reliability due to limited on-site maintenance capabilities. We rely heavily on automation for server management.

Hardware Configuration

The server infrastructure consists of three primary server types: Edge Servers, Central Processing Servers, and Data Storage Servers. Each type is detailed below.

Edge Servers

Edge servers are located closest to the data collection points (sensor arrays and camera traps). They perform initial data processing and filtering, reducing the amount of data transmitted to the central servers.

Component Specification
Processor Intel Xeon Silver 4310 (8 Cores, 2.1 GHz)
RAM 64 GB DDR4 ECC 3200MHz
Storage (OS) 256 GB NVMe SSD
Storage (Data Buffer) 1 TB SATA SSD
Network Interface 2 x 10 Gigabit Ethernet
Power Supply 48V DC Input, Redundant Power Supplies

These servers utilize a custom power management system to optimize energy consumption. The operating system is a minimal Ubuntu Server 20.04 installation.

Central Processing Servers

Central processing servers are responsible for running the complex AI models and performing the bulk of the data analysis. They are housed in a hardened, climate-controlled shelter.

Component Specification
Processor 2 x AMD EPYC 7763 (64 Cores, 2.45 GHz)
RAM 256 GB DDR4 ECC 3200MHz
GPU 4 x NVIDIA A100 80GB
Storage (OS) 512 GB NVMe SSD
Storage (Data) 8 TB NVMe SSD (RAID 0)
Network Interface 2 x 40 Gigabit Ethernet
Power Supply Redundant 208V AC Power Supplies

These servers utilize Kubernetes for container orchestration and Prometheus for monitoring.

Data Storage Servers

Data storage servers provide long-term storage for the collected data and model artifacts.

Component Specification
Processor Intel Xeon Gold 6338 (32 Cores, 2.0 GHz)
RAM 128 GB DDR4 ECC 3200MHz
Storage 64 TB HDD (RAID 6)
Network Interface 2 x 10 Gigabit Ethernet
File System ZFS
Power Supply Redundant 208V AC Power Supplies

These servers are configured with Ceph for distributed storage and data redundancy.

Software Configuration

All servers utilize a common software stack to ensure consistency and ease of management.

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️