Server rental store

AI in Jersey

# AI in Jersey: Server Configuration Documentation

This document details the server configuration for the "AI in Jersey" project, providing a comprehensive guide for system administrators and developers. This project focuses on running large language models (LLMs) for natural language processing tasks within the Jersey data center. This guide assumes a basic understanding of Linux server administration and networking concepts. This is intended for newcomers to the wiki, so explanations will be thorough.

Overview

The "AI in Jersey" infrastructure is built around a cluster of high-performance servers dedicated to model training and inference. The core components include GPU servers, CPU servers for pre- and post-processing, a high-bandwidth network interconnect, and a shared storage system. This setup allows for efficient handling of large datasets and complex model architectures. We utilize a distributed computing framework to parallelize workloads across the cluster. See Distributed Computing for more details on this.

Hardware Specifications

The following tables outline the hardware specifications for each server type within the cluster.

GPU Servers

These servers are the workhorses of the AI cluster, responsible for the computationally intensive tasks of model training and inference. Each server is equipped with multiple high-end GPUs.

Component Specification
Server Model Dell PowerEdge R760xa
CPU 2 x AMD EPYC 7763 (64-core)
GPU 8 x NVIDIA A100 80GB
RAM 512 GB DDR4 ECC REG
Storage 2 x 4TB NVMe SSD (RAID 1)
Network Interface 2 x 200Gbps InfiniBand
Power Supply Redundant 3000W Platinum

CPU Servers

These servers handle data pre-processing, post-processing, and orchestrate the overall workflow.

Component Specification
Server Model HP ProLiant DL380 Gen10
CPU 2 x Intel Xeon Gold 6338 (32-core)
RAM 256 GB DDR4 ECC REG
Storage 4 x 8TB SAS HDD (RAID 5)
Network Interface 2 x 100Gbps Ethernet
Power Supply Redundant 800W Platinum

Storage Server

This server provides shared storage for the entire cluster, accessible via a high-speed network.

Component Specification
Server Model NetApp FAS2750
CPU 2 x Intel Xeon Gold 6248R (24-core)
RAM 128 GB DDR4 ECC REG
Storage 16 x 18TB SAS HDD (RAID-DP) - 288TB usable capacity
Network Interface 4 x 100Gbps Ethernet
Connectivity Fibre Channel over Ethernet (FCoE)

Software Stack

The software stack is carefully chosen to maximize performance and scalability.

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️