AI in Stockport
- AI in Stockport: Server Configuration
This article details the server configuration supporting the "AI in Stockport" project, a local initiative utilizing Artificial Intelligence for improved resource allocation and predictive analysis within the Stockport Metropolitan Borough Council. This guide is intended for new system administrators and developers contributing to the project.
Overview
The "AI in Stockport" project relies on a distributed server architecture designed for scalability and redundancy. The core infrastructure consists of four primary server roles: Data Ingestion, Model Training, Model Serving, and a central Monitoring system. Each role is detailed below. We utilise a hybrid cloud approach, with some services hosted on-premise for data security and latency reasons, and others leveraging cloud providers for computational power. See Deployment Strategy for more information on the overall architecture.
Data Ingestion Servers
These servers are responsible for collecting, validating, and pre-processing data from various sources including council databases, public APIs, and sensor networks. Data security is paramount; all data is encrypted both in transit and at rest. See Data Security Protocols for detailed information.
Server Role | Quantity | Operating System | CPU | RAM | Storage |
---|---|---|---|---|---|
Data Ingestion | 3 | Ubuntu Server 22.04 LTS | Intel Xeon Silver 4310 (12 Cores) | 64 GB DDR4 ECC | 8 TB RAID 10 SSD |
The ingestion pipeline uses Apache Kafka for message queuing and Apache Spark for initial data transformation. These servers also run PostgreSQL databases for storing metadata about ingested data. These servers are secured using Firewall Configuration.
Model Training Servers
These servers are the computational workhorses of the project, responsible for training and validating the AI models. Due to the intensive nature of model training, these servers are equipped with high-performance GPUs. We leverage Kubernetes for resource orchestration and scalability.
Server Role | Quantity | Operating System | GPU | CPU | RAM | Storage |
---|---|---|---|---|---|---|
Model Training | 4 | Ubuntu Server 22.04 LTS | NVIDIA A100 (80GB) | Intel Xeon Gold 6338 (32 Cores) | 256 GB DDR4 ECC | 2 TB NVMe SSD |
The primary framework used for model training is TensorFlow with Python 3.9 as the programming language. We utilise MLflow for experiment tracking and model versioning. See Model Training Pipeline for a workflow diagram.
Model Serving Servers
These servers host the trained AI models and provide an API endpoint for accessing their predictions. Low latency is critical for this role. We employ Docker for containerization and gRPC for efficient communication.
Server Role | Quantity | Operating System | CPU | RAM | Storage |
---|---|---|---|---|---|
Model Serving | 6 | Alpine Linux 3.18 | Intel Xeon E-2388G (8 Cores) | 32 GB DDR4 ECC | 1 TB SSD |
Model serving is managed using Triton Inference Server to optimise performance and resource utilisation. See API Documentation for details on accessing the prediction endpoints. These servers are monitored by the Monitoring System described below.
Monitoring System
The Monitoring System provides real-time insights into the health and performance of all servers within the infrastructure. It is critical for identifying and resolving issues proactively.
The monitoring system consists of a central server running Prometheus and Grafana. Data is collected from each server using Node Exporter and visualised using Grafana dashboards. Alerts are configured to notify administrators of critical events. See Alerting Configuration for details. The Monitoring System utilizes Logstash and Elasticsearch for centralized logging.
Network Configuration
All servers are connected via a dedicated VLAN with a 10 Gbps backbone. Network security is enforced using Network Segmentation and Intrusion Detection Systems. The external API endpoint is protected by a Web Application Firewall.
Future Expansion
Plans are underway to expand the infrastructure to include additional model training servers and to explore the use of federated learning techniques. See Future Development Roadmap for more information. A complete list of Hardware Inventory is maintained.
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️