AI in Aruba

AI in Aruba: Server Configuration & Deployment

This article details the server configuration used to support Artificial Intelligence (AI) workloads within the Aruba Networks environment. This is intended as a guide for new systems administrators and engineers deploying or maintaining AI-related infrastructure.

Overview

The “AI in Aruba” initiative leverages a hybrid server architecture, combining on-premise hardware for low-latency processing with cloud-based resources for model training and large dataset storage. This allows for real-time insights from network data while maintaining scalability and cost-effectiveness. This document focuses on the on-premise server configuration, which forms the core of the real-time analytics pipeline. We utilize a combination of high-performance compute servers and dedicated storage arrays. Understanding the interplay between Network Infrastructure and server performance is crucial.

Hardware Specifications

The core processing is handled by dedicated GPU servers. The following table outlines the key specifications for these servers:

Component	Specification	Quantity per Server
CPU	Dual Intel Xeon Gold 6338 (32 cores/64 threads per CPU)	2
RAM	512 GB DDR4 ECC Registered 3200MHz	1
GPU	NVIDIA A100 80GB PCIe 4.0	4
Storage (OS)	500GB NVMe PCIe Gen4 SSD	1
Storage (Data)	4TB NVMe PCIe Gen4 SSD (RAID 0)	1
Network Interface	Dual 100GbE QSFP28	1
Power Supply	2000W Redundant Platinum	2

These servers are housed in a dedicated, climate-controlled server room with redundant power and cooling. Proper Server Room Management is essential for maintaining stability. See also Power Distribution Units for details on power infrastructure.

Storage Configuration

Data storage is a critical component, requiring both high capacity and low latency. We employ a tiered storage approach. Raw data is stored in a cloud-based object storage solution (AWS S3, Google Cloud Storage), while frequently accessed data is cached on-premise.

The on-premise caching layer utilizes a dedicated NVMe storage array. Details are below:

Parameter	Value
Vendor	Pure Storage
Model	FlashArray//X70
Capacity	100TB Raw
Usable Capacity	50TB (after RAID and Deduplication)
RAID Level	RAID-TP (Triple Parity)
Connectivity	100GbE iSCSI
Data Reduction	Inline Deduplication & Compression

This array provides the necessary I/O performance for real-time data analysis. Understanding Storage Area Networks is crucial for managing this infrastructure. We also employ Data Backup Strategies to protect against data loss.

Software Stack

The software stack is built on a Linux foundation, providing flexibility and control.

Component	Version
Operating System	Ubuntu Server 22.04 LTS
Containerization	Docker 24.0.5
Orchestration	Kubernetes 1.27
AI Framework	TensorFlow 2.13.0
Programming Language	Python 3.10
Data Processing	Apache Spark 3.4.1
Monitoring	Prometheus & Grafana

All AI models are containerized using Docker and orchestrated with Kubernetes for scalability and resilience. Kubernetes Deployment procedures should be followed carefully. We utilize Continuous Integration/Continuous Deployment (CI/CD) pipelines for automated model updates. Refer to Security Best Practices for securing the software stack. The Networking Configuration must allow for communication between components.

Network Considerations

High bandwidth and low latency are essential for AI workloads. The servers are connected to the Aruba Networks core network via dual 100GbE uplinks. VLAN Segmentation is used to isolate AI traffic from other network traffic. Quality of Service (QoS) policies are implemented to prioritize AI-related data flows. The Firewall Configuration is configured to allow only necessary traffic to and from the AI servers. Load Balancing is employed to distribute traffic across multiple servers.

Future Expansion

Planned future expansion includes:

Adding more GPU servers to increase processing capacity.
Implementing a dedicated NVMe-oF (NVMe over Fabrics) network for even lower latency storage access.
Integrating with additional data sources.
Exploring the use of more advanced AI frameworks like PyTorch.

Refer to Scalability Planning for long-term infrastructure growth.

Server Maintenance is critical to long-term stability.

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️