AI in Crawley
- AI in Crawley: Server Configuration Documentation
This document details the server configuration supporting the "AI in Crawley" project, designed to provide a resource for new team members and maintainers. This system utilizes a distributed architecture to handle the computational demands of machine learning models processing real-time data from the Crawley area.
Overview
The "AI in Crawley" project involves analyzing various data streams – CCTV footage, traffic patterns, environmental sensors, and social media feeds – to provide insights into city management, public safety, and resource allocation. The server infrastructure is built to be scalable, reliable, and secure. The core components are a cluster of GPU servers for model training and inference, a data lake for storage, and a message queue for data ingestion. The system is monitored using Prometheus and Grafana, with logging handled by the ELK stack. This documentation will cover the hardware, software, and network configuration of these key components.
Hardware Configuration
The core of the system is comprised of four primary server types: Data Ingestion Servers, GPU Training Servers, Inference Servers, and the Central Database Server. Each has specific hardware requirements outlined below.
Data Ingestion Servers
These servers are responsible for receiving and pre-processing data from various sources. They are relatively lightweight in terms of computational needs but require high network bandwidth.
Component | Specification |
---|---|
CPU | Intel Xeon Silver 4310 (12 cores) |
RAM | 64 GB DDR4 ECC |
Storage | 2 x 1 TB NVMe SSD (RAID 1) |
Network Interface | 10 Gbps Ethernet |
Operating System | Ubuntu Server 22.04 LTS |
GPU Training Servers
These servers are the workhorses for training our machine learning models. They require powerful GPUs and significant RAM. We currently have three of these servers in operation.
Component | Specification |
---|---|
CPU | AMD EPYC 7763 (64 cores) |
RAM | 256 GB DDR4 ECC |
GPU | 4 x NVIDIA A100 (80GB) |
Storage | 4 x 4 TB NVMe SSD (RAID 0) |
Network Interface | 100 Gbps InfiniBand |
Operating System | CentOS Stream 9 |
Inference Servers
These servers deploy trained models to perform real-time predictions. They need to be highly responsive and efficient. We deploy these servers using Docker containers.
Component | Specification |
---|---|
CPU | Intel Xeon Gold 6338 (32 cores) |
RAM | 128 GB DDR4 ECC |
GPU | 2 x NVIDIA RTX A4000 (16GB) |
Storage | 1 x 2 TB NVMe SSD |
Network Interface | 25 Gbps Ethernet |
Operating System | Ubuntu Server 22.04 LTS |
Software Configuration
The software stack is built around open-source technologies, prioritizing flexibility and cost-effectiveness.
- Operating Systems: As noted above, we utilize a mix of Ubuntu Server 22.04 LTS and CentOS Stream 9. System Administration procedures are documented separately.
- Programming Languages: Python is the primary language used for model development and deployment. Python scripting is essential for many tasks.
- Machine Learning Frameworks: TensorFlow and PyTorch are both supported, leveraging CUDA for GPU acceleration.
- Data Storage: A Hadoop cluster provides a scalable data lake for storing raw and processed data. HDFS is the primary storage layer.
- Message Queue: RabbitMQ handles asynchronous communication between data ingestion servers and processing pipelines.
- Containerization: Docker and Kubernetes are used to deploy and manage applications.
- Monitoring: Prometheus collects metrics from all servers, and Grafana provides dashboards for visualization. Alerting is managed through Alertmanager.
- Logging: The ELK stack (Elasticsearch, Logstash, Kibana) is used for centralized logging and analysis.
- Version Control: All code is managed using Git and hosted on a private GitLab instance.
Network Configuration
The servers are deployed within a dedicated VLAN on the Crawley City Council network.
- Network Topology: A star topology is used, with a central core switch providing connectivity to all servers.
- Firewall: A firewall is configured to restrict access to the servers based on the principle of least privilege.
- DNS: Internal DNS servers are used to resolve server names. DNS records are maintained centrally.
- VPN: Access to the server infrastructure is restricted to authorized personnel via a VPN.
- Load Balancing: HAProxy is used to load balance traffic across the inference servers.
Security Considerations
Security is paramount. We employ several measures to protect the system and data:
- Regular Security Audits: Periodic security audits are conducted to identify and address vulnerabilities.
- Intrusion Detection System: An IDS monitors network traffic for malicious activity.
- Data Encryption: Data is encrypted both in transit and at rest.
- Access Control: Strict access control policies are enforced. User management is a core security function.
- Patch Management: All software is kept up-to-date with the latest security patches.
Future Considerations
- Exploring the use of federated learning to improve model accuracy while preserving data privacy.
- Implementing automated scaling of inference servers based on demand.
- Integrating with additional data sources.
- Improving the efficiency of the data pipeline.
Server Documentation AI Project Overview Data Security Policy Network Architecture
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️