Server rental store

AI Research

AI Research Server Configuration

Welcome to the documentation for the AI Research serverThis article details the hardware and software configuration for our dedicated AI research environment. This guide is intended for newcomers to the system and provides a comprehensive overview of the server’s capabilities and specifications. Understanding these details will be crucial for utilizing the server effectively and troubleshooting any potential issues. Please refer to our Server Access Guide before attempting to connect.

Overview

The AI Research server is a high-performance computing (HPC) platform designed to support demanding machine learning and deep learning workloads. It is equipped with powerful GPUs, a large amount of RAM, and fast storage to facilitate rapid experimentation and model training. The server runs a customized Linux distribution optimized for AI tasks. This document will cover the hardware components, software environment, and key configuration details.

Hardware Specifications

The following table outlines the core hardware specifications of the AI Research server:

Component Specification
CPU Dual Intel Xeon Gold 6338 (32 cores per CPU, 64 total)
RAM 512GB DDR4 ECC Registered Memory
GPU 4 x NVIDIA A100 80GB GPUs
Storage (OS) 500GB NVMe SSD
Storage (Data) 4 x 8TB SAS HDD (RAID 0)
Network Interface Dual 100GbE Network Adapters
Power Supply 2 x 2000W Redundant Power Supplies

Software Environment

The server utilizes a customized Linux environment built on Ubuntu Server 22.04 LTS. Several key software packages are pre-installed and configured for AI research. Please see the Software Installation Guide for details on additional packages.

Core Packages

The following table lists the core software packages installed on the server:

Package Version
CUDA Toolkit 12.2
cuDNN 8.9.2
Python 3.10
TensorFlow 2.13.0
PyTorch 2.0.1
Jupyter Notebook 6.4.5
Docker 24.0.5
NVIDIA Driver 535.104.05

Containerization

We heavily utilize Docker for managing dependencies and ensuring reproducibility. Pre-built Docker images with common AI frameworks are available on our internal Docker Registry. Using containers is strongly recommended for all research projects. This ensures a consistent environment across different users and prevents conflicts between software versions. Refer to the Docker Tutorial for more information.

Server Configuration Details

The AI Research server is configured with several specific settings to optimize performance. These configurations are managed by our System Administration Team.

Network Configuration

The server is accessible through two 100GbE network interfaces. The primary interface is used for general network access, while the secondary interface is dedicated to storage traffic. Users can access the server via SSH using the address `ai-research.example.com`. Please consult the Network Security Policy for information on acceptable use.

Storage Configuration

The data storage is configured in a RAID 0 array for maximum performance. While this provides fast read/write speeds, it also means there is no redundancy. Therefore, it is *critical* to regularly back up your data using our provided Backup System. The data storage is mounted at `/data`.

GPU Configuration

The NVIDIA A100 GPUs are configured to maximize memory utilization and computational throughput. Users can select the desired GPU using environment variables within their Docker containers. For example, `CUDA_VISIBLE_DEVICES=0,1` will make GPUs 0 and 1 available to the container. See the GPU Usage Guidelines for best practices.

User Accounts

User accounts are managed through LDAP authentication. New users can request an account through the Account Request Form. Access to specific directories and resources is granted based on group membership.

Troubleshooting

If you encounter issues while using the AI Research server, please consult the following resources:

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️