AI in Oceania

From Server rental store
Revision as of 07:28, 16 April 2025 by Admin (talk | contribs) (Automated server configuration article)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

AI in Oceania: Server Configuration

This article details the server configuration utilized for hosting the "AI in Oceania" project, a collaborative research initiative focused on artificial intelligence applications relevant to the Pacific Island region. It's designed as a guide for new system administrators and developers contributing to the project. Understanding this configuration is critical for maintenance, troubleshooting, and future scaling. This document assumes a basic familiarity with Linux server administration and networking concepts. Refer to Help:Contents for general MediaWiki help.

Overview

The "AI in Oceania" project requires significant computational resources for model training, inference, and data storage. The infrastructure is primarily hosted in a secure data center located in Auckland, New Zealand, chosen for its reliable power, network connectivity, and geographical proximity to many Pacific Island nations. We utilize a hybrid cloud approach, leveraging both dedicated hardware and cloud-based services. This allows for flexibility and cost optimization. See Help:Linking to pages for details on internal links. The core services are detailed in Special:Search/Core Services.

Hardware Specifications

The primary compute cluster consists of eight dedicated servers, each purpose-built for machine learning workloads. The following table outlines the specifications for each server:

Server ID CPU RAM GPU Storage
Server-01 Intel Xeon Gold 6248R (24 cores) 256 GB DDR4 ECC NVIDIA Tesla V100 (32GB) 4 x 4TB NVMe SSD (RAID 10)
Server-02 Intel Xeon Gold 6248R (24 cores) 256 GB DDR4 ECC NVIDIA Tesla V100 (32GB) 4 x 4TB NVMe SSD (RAID 10)
Server-03 Intel Xeon Gold 6248R (24 cores) 256 GB DDR4 ECC NVIDIA Tesla V100 (32GB) 4 x 4TB NVMe SSD (RAID 10)
Server-04 Intel Xeon Gold 6248R (24 cores) 256 GB DDR4 ECC NVIDIA Tesla V100 (32GB) 4 x 4TB NVMe SSD (RAID 10)
Server-05 AMD EPYC 7763 (64 cores) 512 GB DDR4 ECC NVIDIA A100 (80GB) 8 x 8TB NVMe SSD (RAID 10)
Server-06 AMD EPYC 7763 (64 cores) 512 GB DDR4 ECC NVIDIA A100 (80GB) 8 x 8TB NVMe SSD (RAID 10)
Server-07 AMD EPYC 7763 (64 cores) 512 GB DDR4 ECC NVIDIA A100 (80GB) 8 x 8TB NVMe SSD (RAID 10)
Server-08 AMD EPYC 7763 (64 cores) 512 GB DDR4 ECC NVIDIA A100 (80GB) 8 x 8TB NVMe SSD (RAID 10)

Software Stack

The servers run Ubuntu Server 22.04 LTS. The core software stack includes:

  • Python 3.10: The primary programming language for AI development. See Help:How to edit a page for editing guidelines.
  • TensorFlow 2.12: A popular machine learning framework.
  • PyTorch 1.13: Another widely used machine learning framework.
  • CUDA 11.8: NVIDIA's parallel computing platform and API.
  • Docker 20.10: Containerization platform for reproducible environments. See Help:Formatting for advanced formatting options.
  • Kubernetes 1.26: Container orchestration system for managing deployments.
  • PostgreSQL 14: Relational database for metadata storage. A discussion about database performance is available on Special:Search/Database Performance.
  • 'Object Storage (MinIO): A distributed object storage system.

Networking & Security

The servers are connected via a dedicated 10 Gigabit Ethernet network. Security is paramount. The following measures are in place:

Security Feature Description Status
Firewall UFW (Uncomplicated Firewall) configured with strict rules. Enabled
Intrusion Detection System (IDS) Suricata IDS monitoring network traffic for malicious activity. Enabled
VPN Access Secure VPN access for remote administration. Enabled
Regular Security Audits Penetration testing and vulnerability scanning performed quarterly. Scheduled
Data Encryption All data at rest and in transit is encrypted. Enabled

Cloud Integration

We utilize Amazon Web Services (AWS) for supplemental compute and storage. Specifically:

Service Purpose Configuration
AWS S3 Long-term archival storage of datasets. Standard storage class, versioning enabled.
AWS EC2 On-demand compute instances for burst capacity during peak training loads. p3.8xlarge instances with NVIDIA V100 GPUs.
AWS SageMaker Managed machine learning service for experimentation and deployment. Utilized for prototyping new models.

Monitoring and Logging

Comprehensive monitoring and logging are essential for maintaining system stability and performance. We use:

  • Prometheus: For collecting and storing metrics.
  • Grafana: For visualizing metrics and creating dashboards. See Special:Search/Grafana Dashboards for examples.
  • 'ELK Stack (Elasticsearch, Logstash, Kibana): For centralized log management.
  • Nagios: For alerting on critical system events.

Future Considerations

Planned upgrades include transitioning to newer GPU architectures (NVIDIA H100) and expanding the object storage capacity. We are also evaluating the implementation of a more robust disaster recovery plan. Further information on project goals can be found on Project:AI in Oceania.


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️