Server rental store

AI in Cleveland

# AI in Cleveland: Server Configuration Documentation

This document details the server configuration supporting the "AI in Cleveland" project, providing a technical overview for new administrators and developers. This project focuses on providing local AI services and data analysis capabilities to the Cleveland metropolitan area. It is critical to understand this setup for maintenance, scaling, and troubleshooting.

Overview

The "AI in Cleveland" infrastructure is built around a hybrid cloud model. Core processing and model training occur on dedicated on-premise hardware, while data storage and less intensive tasks are handled by cloud services. This allows for cost-effective scaling and data security. The primary goal is to provide accessible AI tools for local businesses and researchers. This project leverages several key software components including TensorFlow, PyTorch, and Kubernetes for orchestration.

Hardware Specifications

We utilize three primary server types: Compute Nodes, Storage Nodes, and the Master Node. Each plays a distinct role in the overall architecture.

Compute Nodes

These servers handle the bulk of the AI model training and inference. They are equipped with high-end GPUs and fast processors.

Specification Value
CPU Dual Intel Xeon Gold 6338 (32 cores/64 threads per CPU)
RAM 512GB DDR4 ECC Registered @ 3200MHz
GPU 4x NVIDIA A100 (80GB VRAM each)
Storage (Local) 2x 4TB NVMe SSD (RAID 0)
Network Interface Dual 100GbE QSFP28
Operating System Ubuntu 22.04 LTS

We currently operate six Compute Nodes. Each node is monitored using Nagios for performance and uptime.

Storage Nodes

These servers provide large-capacity storage for datasets and model checkpoints. They are optimized for high throughput and reliability.

Specification Value
CPU Intel Xeon Silver 4310 (12 cores/24 threads)
RAM 128GB DDR4 ECC Registered @ 2666MHz
Storage 16x 16TB SAS HDD (RAID 6) - 192TB Usable
Network Interface Dual 25GbE SFP28
Operating System CentOS 8

We have four Storage Nodes, configured for redundancy and scalability. Data is backed up nightly to a separate offsite location using rsync.

Master Node

The Master Node manages the Kubernetes cluster and provides a central point for monitoring and control.

Specification Value
CPU Intel Xeon Gold 6342 (28 cores/56 threads)
RAM 256GB DDR4 ECC Registered @ 3200MHz
Storage 2x 1TB NVMe SSD (RAID 1)
Network Interface Quad 10GbE SFP+
Operating System Ubuntu 22.04 LTS

The Master Node also hosts the Grafana dashboard for visualizing system metrics and the Prometheus time-series database.

Software Configuration

The software stack is designed for scalability and ease of management.

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️