AI in Togo

From Server rental store
Jump to navigation Jump to search
  1. AI in Togo: Server Configuration & Deployment

This article details the server configuration used to support Artificial Intelligence (AI) initiatives within Togo. This documentation is intended for new system administrators and developers involved in maintaining and expanding Togo's AI infrastructure. It covers hardware, software, network considerations, and security best practices. This document assumes familiarity with basic Linux server administration and networking concepts.

Overview

Togo is actively investing in AI to address challenges in agriculture, healthcare, and government services. This requires a robust and scalable server infrastructure. The current architecture employs a hybrid approach, utilizing both on-premise servers for data security and cloud resources for burst capacity and specialized AI services. This document focuses primarily on the on-premise infrastructure, with notes on cloud integration where relevant. See System Architecture Overview for a broader perspective.

Hardware Specifications

The core AI infrastructure consists of three primary server clusters: a data ingestion cluster, a model training cluster, and an inference cluster. Each cluster utilizes a slightly different hardware configuration optimized for its specific workload.

Server Role CPU RAM Storage Network Interface
Intel Xeon Gold 6248R (24 cores) | 128 GB DDR4 ECC | 16 TB RAID 6 HDD | 10 Gbps Ethernet
2 x AMD EPYC 7763 (64 cores each) | 512 GB DDR4 ECC | 32 TB NVMe SSD RAID 0 | 100 Gbps InfiniBand
Intel Xeon Silver 4210 (10 cores) | 64 GB DDR4 ECC | 8 TB NVMe SSD | 1 Gbps Ethernet

These specifications are subject to change as new technologies become available. Refer to Hardware Lifecycle Management for the procurement and replacement process. Power requirements are documented in Power & Cooling Specifications.

Software Stack

The software stack is built around Ubuntu Server 22.04 LTS, providing a stable and well-supported base. Key software components include:

  • Operating System: Ubuntu Server 22.04 LTS
  • Containerization: Docker and Kubernetes are used for application deployment and orchestration. See Containerization Best Practices for detailed instructions.
  • AI Frameworks: TensorFlow, PyTorch, and scikit-learn are utilized. Framework versions are managed using Conda environments.
  • Database: PostgreSQL is used for storing metadata and model versions. See Database Administration Guide.
  • Monitoring: Prometheus and Grafana are used for system monitoring and alerting. Refer to Monitoring and Alerting Procedures.
  • Version Control: Git is used for all code and configuration management. See Git Workflow.

Network Configuration

The server infrastructure is segmented into three virtual LANs (VLANs) for security and performance:

  • VLAN 10: Data Ingestion – 192.168.10.0/24
  • VLAN 20: Model Training – 192.168.20.0/24
  • VLAN 30: Inference – 192.168.30.0/24

A firewall (pfSense) is used to control traffic between VLANs and the external network. See Network Security Policy for detailed rules. Inter-VLAN routing is handled by a dedicated layer 3 switch. DNS is provided by an internal Bind9 server. Refer to DNS Configuration for more information.

Network Component IP Address Role
192.168.1.1 | Gateway and Firewall
192.168.1.2 | Inter-VLAN Routing
192.168.1.3 | Internal DNS

Security Considerations

Security is paramount. The following measures are in place:

  • Firewall: Strict firewall rules are enforced to limit network access.
  • Intrusion Detection/Prevention: Snort is used for intrusion detection and prevention. See IDS/IPS Configuration.
  • Access Control: SSH access is restricted to authorized personnel using key-based authentication.
  • Data Encryption: Data at rest is encrypted using LUKS. Data in transit is encrypted using TLS. See Data Encryption Standards.
  • Regular Security Audits: Regular security audits are conducted to identify and address vulnerabilities. Refer to Security Audit Schedule.

Cloud Integration

For computationally intensive tasks or when burst capacity is required, the on-premise infrastructure integrates with Amazon Web Services (AWS). Specifically, we utilize AWS SageMaker for model training and AWS Lambda for serverless inference. Data transfer between on-premise servers and AWS is secured using VPN connections. See Cloud Integration Guide for detailed instructions.

Cloud Service Purpose Region
Model Training (Burst Capacity) | us-east-1
Serverless Inference | us-east-1
Data Storage (Backup) | us-east-1

Future Expansion

Planned future expansion includes the addition of GPU servers to the model training cluster to accelerate deep learning tasks. We are also evaluating the use of Kubernetes operators to automate the deployment and management of AI workloads. See Future Infrastructure Roadmap.

Server Maintenance Procedures Troubleshooting Guide Data Backup and Recovery User Account Management Change Management Process Contact Information


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️