AI in Togo
- AI in Togo: Server Configuration & Deployment
This article details the server configuration used to support Artificial Intelligence (AI) initiatives within Togo. This documentation is intended for new system administrators and developers involved in maintaining and expanding Togo's AI infrastructure. It covers hardware, software, network considerations, and security best practices. This document assumes familiarity with basic Linux server administration and networking concepts.
Overview
Togo is actively investing in AI to address challenges in agriculture, healthcare, and government services. This requires a robust and scalable server infrastructure. The current architecture employs a hybrid approach, utilizing both on-premise servers for data security and cloud resources for burst capacity and specialized AI services. This document focuses primarily on the on-premise infrastructure, with notes on cloud integration where relevant. See System Architecture Overview for a broader perspective.
Hardware Specifications
The core AI infrastructure consists of three primary server clusters: a data ingestion cluster, a model training cluster, and an inference cluster. Each cluster utilizes a slightly different hardware configuration optimized for its specific workload.
Server Role | CPU | RAM | Storage | Network Interface |
---|---|---|---|---|
Intel Xeon Gold 6248R (24 cores) | 128 GB DDR4 ECC | 16 TB RAID 6 HDD | 10 Gbps Ethernet | ||||
2 x AMD EPYC 7763 (64 cores each) | 512 GB DDR4 ECC | 32 TB NVMe SSD RAID 0 | 100 Gbps InfiniBand | ||||
Intel Xeon Silver 4210 (10 cores) | 64 GB DDR4 ECC | 8 TB NVMe SSD | 1 Gbps Ethernet |
These specifications are subject to change as new technologies become available. Refer to Hardware Lifecycle Management for the procurement and replacement process. Power requirements are documented in Power & Cooling Specifications.
Software Stack
The software stack is built around Ubuntu Server 22.04 LTS, providing a stable and well-supported base. Key software components include:
- Operating System: Ubuntu Server 22.04 LTS
- Containerization: Docker and Kubernetes are used for application deployment and orchestration. See Containerization Best Practices for detailed instructions.
- AI Frameworks: TensorFlow, PyTorch, and scikit-learn are utilized. Framework versions are managed using Conda environments.
- Database: PostgreSQL is used for storing metadata and model versions. See Database Administration Guide.
- Monitoring: Prometheus and Grafana are used for system monitoring and alerting. Refer to Monitoring and Alerting Procedures.
- Version Control: Git is used for all code and configuration management. See Git Workflow.
Network Configuration
The server infrastructure is segmented into three virtual LANs (VLANs) for security and performance:
- VLAN 10: Data Ingestion – 192.168.10.0/24
- VLAN 20: Model Training – 192.168.20.0/24
- VLAN 30: Inference – 192.168.30.0/24
A firewall (pfSense) is used to control traffic between VLANs and the external network. See Network Security Policy for detailed rules. Inter-VLAN routing is handled by a dedicated layer 3 switch. DNS is provided by an internal Bind9 server. Refer to DNS Configuration for more information.
Network Component | IP Address | Role |
---|---|---|
192.168.1.1 | Gateway and Firewall | ||
192.168.1.2 | Inter-VLAN Routing | ||
192.168.1.3 | Internal DNS |
Security Considerations
Security is paramount. The following measures are in place:
- Firewall: Strict firewall rules are enforced to limit network access.
- Intrusion Detection/Prevention: Snort is used for intrusion detection and prevention. See IDS/IPS Configuration.
- Access Control: SSH access is restricted to authorized personnel using key-based authentication.
- Data Encryption: Data at rest is encrypted using LUKS. Data in transit is encrypted using TLS. See Data Encryption Standards.
- Regular Security Audits: Regular security audits are conducted to identify and address vulnerabilities. Refer to Security Audit Schedule.
Cloud Integration
For computationally intensive tasks or when burst capacity is required, the on-premise infrastructure integrates with Amazon Web Services (AWS). Specifically, we utilize AWS SageMaker for model training and AWS Lambda for serverless inference. Data transfer between on-premise servers and AWS is secured using VPN connections. See Cloud Integration Guide for detailed instructions.
Cloud Service | Purpose | Region |
---|---|---|
Model Training (Burst Capacity) | us-east-1 | ||
Serverless Inference | us-east-1 | ||
Data Storage (Backup) | us-east-1 |
Future Expansion
Planned future expansion includes the addition of GPU servers to the model training cluster to accelerate deep learning tasks. We are also evaluating the use of Kubernetes operators to automate the deployment and management of AI workloads. See Future Infrastructure Roadmap.
Server Maintenance Procedures Troubleshooting Guide Data Backup and Recovery User Account Management Change Management Process Contact Information
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️