Server rental store

AI Standards

# AI Standards - Server Configuration

This document details the server configuration standards for all servers dedicated to Artificial Intelligence (AI) workloads within our infrastructure. These standards ensure consistency, performance, and maintainability across all AI-related deployments. This guide is intended for newcomers to the server environment and will cover hardware, software, and networking aspects. Please consult with the System Administrators team before making any changes to server configurations.

Hardware Specifications

AI workloads demand significant computational resources. The following table outlines the minimum and recommended hardware specifications for AI servers. These specifications are subject to change as AI model complexity increases. Please refer to the Hardware Revision History for the latest updates.

Component Minimum Specification Recommended Specification - CPU Intel Xeon Silver 4310 (12 Cores) Intel Xeon Gold 6338 (32 Cores) - RAM 128 GB DDR4 ECC 3200MHz 512 GB DDR4 ECC 3200MHz - GPU NVIDIA Tesla T4 (16GB VRAM) NVIDIA A100 (80GB VRAM) - Storage (OS) 500GB NVMe SSD 1TB NVMe SSD - Storage (Data) 4TB HDD (RAID 1) 16TB HDD (RAID 6) or 8TB NVMe SSD (RAID 1) - Network Interface 10GbE 40GbE or 100GbE

These specifications are designed to support a wide range of AI tasks, including Machine Learning, Deep Learning, and Natural Language Processing.

Software Stack

All AI servers will utilize a standardized software stack to ensure compatibility and ease of deployment. The Software Deployment Pipeline details the automated deployment process.

Software Component Version Purpose - Operating System Ubuntu Server 22.04 LTS Provides the base operating environment. See OS Hardening Guide for security best practices. - CUDA Toolkit 12.3 NVIDIA’s parallel computing platform and programming model. - cuDNN 8.9.2 NVIDIA CUDA Deep Neural Network library. - Python 3.10 Primary programming language for AI development. - TensorFlow 2.12 Open-source machine learning framework. - PyTorch 2.0.1 Open-source machine learning framework. - Docker 24.0.5 Containerization platform for application deployment. Refer to Docker Best Practices.

All software packages must be installed using approved package managers and repositories. Direct compilation from source is discouraged unless explicitly approved by the Security Team. Regularly check for software updates and apply them promptly following the Patch Management Policy.

Networking Configuration

Proper network configuration is crucial for efficient data transfer and communication between AI servers.

Network Parameter Value - IP Addressing Static IP Addresses assigned via DHCP reservation. See IP Address Allocation. - DNS Internal DNS servers managed by the Network Engineering team. - Firewall Enabled with a restrictive default policy. Refer to the Firewall Ruleset. - VLAN Dedicated VLAN for AI workloads (VLAN 100). - SSH Access Limited to authorized personnel via key-based authentication. See SSH Security Guidelines. - Monitoring Servers are monitored by Nagios and Prometheus for performance and availability.

All AI servers are required to be part of the dedicated AI VLAN to ensure network isolation and security. Network traffic should be monitored regularly to identify any anomalies. Consider using Network Performance Monitoring tools for detailed analysis.

Security Considerations

AI workloads often involve sensitive data, making security a paramount concern.

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️