Server rental store

AI in Woking

# AI in Woking: Server Configuration and Deployment

This article details the server configuration used to support the "AI in Woking" project, a local initiative exploring applications of Artificial Intelligence within the borough. This document is intended for new system administrators and developers contributing to the project. It outlines the hardware, software, and network setup necessary for successful operation. Please review this document thoroughly before making any changes to the production environment. Refer to the System Administration Manual for general site policies.

Project Overview

The "AI in Woking" project involves several key components: data collection from local sources (e.g., traffic sensors, environmental monitors), model training using a cluster of GPU servers, and a web-based interface for accessing AI-powered insights. The project relies heavily on Python for scripting and data processing, and TensorFlow and PyTorch for machine learning tasks. Data storage is managed using PostgreSQL, and the web interface is built with PHP. Security is paramount; see Security Policies for detailed guidance.

Hardware Configuration

The server infrastructure is hosted in a dedicated rack at a local data center. The primary server roles are divided across several physical machines. The entire infrastructure is monitored using Nagios.

Primary Server Specs

Server Role Model CPU RAM Storage Network Interface
Data Collection Server Dell PowerEdge R750 Intel Xeon Gold 6338 128 GB DDR4 2 x 1 TB NVMe SSD (RAID 1) 10 GbE
Model Training Cluster (Node 1-4) Supermicro SYS-2029U-TR4 AMD EPYC 7763 256 GB DDR4 4 x 4 TB SATA HDD (RAID 5) + 1 x 512GB NVMe SSD 25 GbE
Web Server HP ProLiant DL380 Gen10 Intel Xeon Silver 4210 64 GB DDR4 2 x 480 GB SATA SSD (RAID 1) 1 GbE
Database Server Dell PowerEdge R650 Intel Xeon Gold 6330 64 GB DDR4 4 x 2 TB SATA HDD (RAID 10) 10 GbE

GPU Specifications (Model Training Cluster)

Each node in the model training cluster is equipped with four NVIDIA A100 GPUs.

GPU Model Memory CUDA Cores Tensor Cores Power Consumption
NVIDIA A100 80 GB HBM2e 6912 432 400W

Software Configuration

The operating system across all servers is Ubuntu Server 22.04 LTS. Specific software packages and versions are detailed below. Configuration management is handled using Ansible.

Software Stack

Server Role Operating System Key Software Versions
Data Collection Server Ubuntu Server 22.04 LTS Python, RabbitMQ, InfluxDB 3.10, 3.9, 2.0
Model Training Cluster Ubuntu Server 22.04 LTS Python, TensorFlow, PyTorch, CUDA, NCCL 3.10, 2.12, 2.0, 12.1, 2.12
Web Server Ubuntu Server 22.04 LTS PHP, Apache, MariaDB Client 8.1, 2.4, 10.6
Database Server Ubuntu Server 22.04 LTS PostgreSQL, pgAdmin 14, 4

Network Configuration

The servers are connected to the data center network via a dedicated VLAN. Firewall rules are configured using iptables to restrict access to only necessary ports. A reverse proxy, Nginx, is used on the web server to handle SSL termination and load balancing. Internal DNS is managed using Bind9.

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️