AI in Oceania
AI in Oceania: Server Configuration
This article details the server configuration utilized for hosting the "AI in Oceania" project, a collaborative research initiative focused on artificial intelligence applications relevant to the Pacific Island region. It's designed as a guide for new system administrators and developers contributing to the project. Understanding this configuration is critical for maintenance, troubleshooting, and future scaling. This document assumes a basic familiarity with Linux server administration and networking concepts. Refer to Help:Contents for general MediaWiki help.
Overview
The "AI in Oceania" project requires significant computational resources for model training, inference, and data storage. The infrastructure is primarily hosted in a secure data center located in Auckland, New Zealand, chosen for its reliable power, network connectivity, and geographical proximity to many Pacific Island nations. We utilize a hybrid cloud approach, leveraging both dedicated hardware and cloud-based services. This allows for flexibility and cost optimization. See Help:Linking to pages for details on internal links. The core services are detailed in Special:Search/Core Services.
Hardware Specifications
The primary compute cluster consists of eight dedicated servers, each purpose-built for machine learning workloads. The following table outlines the specifications for each server:
| Server ID | CPU | RAM | GPU | Storage |
|---|---|---|---|---|
| Server-01 | Intel Xeon Gold 6248R (24 cores) | 256 GB DDR4 ECC | NVIDIA Tesla V100 (32GB) | 4 x 4TB NVMe SSD (RAID 10) |
| Server-02 | Intel Xeon Gold 6248R (24 cores) | 256 GB DDR4 ECC | NVIDIA Tesla V100 (32GB) | 4 x 4TB NVMe SSD (RAID 10) |
| Server-03 | Intel Xeon Gold 6248R (24 cores) | 256 GB DDR4 ECC | NVIDIA Tesla V100 (32GB) | 4 x 4TB NVMe SSD (RAID 10) |
| Server-04 | Intel Xeon Gold 6248R (24 cores) | 256 GB DDR4 ECC | NVIDIA Tesla V100 (32GB) | 4 x 4TB NVMe SSD (RAID 10) |
| Server-05 | AMD EPYC 7763 (64 cores) | 512 GB DDR4 ECC | NVIDIA A100 (80GB) | 8 x 8TB NVMe SSD (RAID 10) |
| Server-06 | AMD EPYC 7763 (64 cores) | 512 GB DDR4 ECC | NVIDIA A100 (80GB) | 8 x 8TB NVMe SSD (RAID 10) |
| Server-07 | AMD EPYC 7763 (64 cores) | 512 GB DDR4 ECC | NVIDIA A100 (80GB) | 8 x 8TB NVMe SSD (RAID 10) |
| Server-08 | AMD EPYC 7763 (64 cores) | 512 GB DDR4 ECC | NVIDIA A100 (80GB) | 8 x 8TB NVMe SSD (RAID 10) |
Software Stack
The servers run Ubuntu Server 22.04 LTS. The core software stack includes:
- Python 3.10: The primary programming language for AI development. See Help:How to edit a page for editing guidelines.
- TensorFlow 2.12: A popular machine learning framework.
- PyTorch 1.13: Another widely used machine learning framework.
- CUDA 11.8: NVIDIA's parallel computing platform and API.
- Docker 20.10: Containerization platform for reproducible environments. See Help:Formatting for advanced formatting options.
- Kubernetes 1.26: Container orchestration system for managing deployments.
- PostgreSQL 14: Relational database for metadata storage. A discussion about database performance is available on Special:Search/Database Performance.
- 'Object Storage (MinIO): A distributed object storage system.
- Prometheus: For collecting and storing metrics.
- Grafana: For visualizing metrics and creating dashboards. See Special:Search/Grafana Dashboards for examples.
- 'ELK Stack (Elasticsearch, Logstash, Kibana): For centralized log management.
- Nagios: For alerting on critical system events.
- Telegram: @powervps Servers at a discounted price
Networking & Security
The servers are connected via a dedicated 10 Gigabit Ethernet network. Security is paramount. The following measures are in place:
| Security Feature | Description | Status |
|---|---|---|
| Firewall | UFW (Uncomplicated Firewall) configured with strict rules. | Enabled |
| Intrusion Detection System (IDS) | Suricata IDS monitoring network traffic for malicious activity. | Enabled |
| VPN Access | Secure VPN access for remote administration. | Enabled |
| Regular Security Audits | Penetration testing and vulnerability scanning performed quarterly. | Scheduled |
| Data Encryption | All data at rest and in transit is encrypted. | Enabled |
Cloud Integration
We utilize Amazon Web Services (AWS) for supplemental compute and storage. Specifically:
| Service | Purpose | Configuration |
|---|---|---|
| AWS S3 | Long-term archival storage of datasets. | Standard storage class, versioning enabled. |
| AWS EC2 | On-demand compute instances for burst capacity during peak training loads. | p3.8xlarge instances with NVIDIA V100 GPUs. |
| AWS SageMaker | Managed machine learning service for experimentation and deployment. | Utilized for prototyping new models. |
Monitoring and Logging
Comprehensive monitoring and logging are essential for maintaining system stability and performance. We use:
Future Considerations
Planned upgrades include transitioning to newer GPU architectures (NVIDIA H100) and expanding the object storage capacity. We are also evaluating the implementation of a more robust disaster recovery plan. Further information on project goals can be found on Project:AI in Oceania.
Intel-Based Server Configurations
| Configuration | Specifications | Benchmark |
|---|---|---|
| Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
| Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
| Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
| Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
| Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
| Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
| Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
| Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
| Configuration | Specifications | Benchmark |
|---|---|---|
| Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
| Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
| Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
| Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
| EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
| EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
| EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
| EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
| EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
| EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configurationNeed Assistance?
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️