AI in Stockport

From Server rental store
Jump to navigation Jump to search
  1. AI in Stockport: Server Configuration

This article details the server configuration supporting the "AI in Stockport" project, a local initiative utilizing Artificial Intelligence for improved resource allocation and predictive analysis within the Stockport Metropolitan Borough Council. This guide is intended for new system administrators and developers contributing to the project.

Overview

The "AI in Stockport" project relies on a distributed server architecture designed for scalability and redundancy. The core infrastructure consists of four primary server roles: Data Ingestion, Model Training, Model Serving, and a central Monitoring system. Each role is detailed below. We utilise a hybrid cloud approach, with some services hosted on-premise for data security and latency reasons, and others leveraging cloud providers for computational power. See Deployment Strategy for more information on the overall architecture.

Data Ingestion Servers

These servers are responsible for collecting, validating, and pre-processing data from various sources including council databases, public APIs, and sensor networks. Data security is paramount; all data is encrypted both in transit and at rest. See Data Security Protocols for detailed information.

Server Role Quantity Operating System CPU RAM Storage
Data Ingestion 3 Ubuntu Server 22.04 LTS Intel Xeon Silver 4310 (12 Cores) 64 GB DDR4 ECC 8 TB RAID 10 SSD

The ingestion pipeline uses Apache Kafka for message queuing and Apache Spark for initial data transformation. These servers also run PostgreSQL databases for storing metadata about ingested data. These servers are secured using Firewall Configuration.

Model Training Servers

These servers are the computational workhorses of the project, responsible for training and validating the AI models. Due to the intensive nature of model training, these servers are equipped with high-performance GPUs. We leverage Kubernetes for resource orchestration and scalability.

Server Role Quantity Operating System GPU CPU RAM Storage
Model Training 4 Ubuntu Server 22.04 LTS NVIDIA A100 (80GB) Intel Xeon Gold 6338 (32 Cores) 256 GB DDR4 ECC 2 TB NVMe SSD

The primary framework used for model training is TensorFlow with Python 3.9 as the programming language. We utilise MLflow for experiment tracking and model versioning. See Model Training Pipeline for a workflow diagram.

Model Serving Servers

These servers host the trained AI models and provide an API endpoint for accessing their predictions. Low latency is critical for this role. We employ Docker for containerization and gRPC for efficient communication.

Server Role Quantity Operating System CPU RAM Storage
Model Serving 6 Alpine Linux 3.18 Intel Xeon E-2388G (8 Cores) 32 GB DDR4 ECC 1 TB SSD

Model serving is managed using Triton Inference Server to optimise performance and resource utilisation. See API Documentation for details on accessing the prediction endpoints. These servers are monitored by the Monitoring System described below.

Monitoring System

The Monitoring System provides real-time insights into the health and performance of all servers within the infrastructure. It is critical for identifying and resolving issues proactively.

The monitoring system consists of a central server running Prometheus and Grafana. Data is collected from each server using Node Exporter and visualised using Grafana dashboards. Alerts are configured to notify administrators of critical events. See Alerting Configuration for details. The Monitoring System utilizes Logstash and Elasticsearch for centralized logging.


Network Configuration

All servers are connected via a dedicated VLAN with a 10 Gbps backbone. Network security is enforced using Network Segmentation and Intrusion Detection Systems. The external API endpoint is protected by a Web Application Firewall.

Future Expansion

Plans are underway to expand the infrastructure to include additional model training servers and to explore the use of federated learning techniques. See Future Development Roadmap for more information. A complete list of Hardware Inventory is maintained.


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️