Server rental store

AI in Catalonia

```wiki

AI in Catalonia: Server Configuration Overview

This article details the server infrastructure supporting Artificial Intelligence (AI) initiatives within Catalonia. It's intended for new system administrators and developers contributing to these projects. Understanding the underlying hardware and software is crucial for effective development and maintenance. This document will cover hardware specifications, software stack, networking, and security considerations.

Hardware Infrastructure

The core of the AI infrastructure resides within a dedicated data center located near Barcelona. This data center houses a cluster of servers designed for high-performance computing and machine learning tasks. Redundancy and scalability are key design principles.

Server Role Model CPU RAM Storage GPU
Compute Node 1-10 Dell PowerEdge R750xa 2 x AMD EPYC 7763 (64 cores/128 threads each) 512 GB DDR4 ECC REG 8 x 4TB NVMe SSD (RAID 0) 4 x NVIDIA A100 (80GB)
Storage Node 1-3 HPE ProLiant DL380 Gen10 2 x Intel Xeon Gold 6338 (32 cores/64 threads each) 256 GB DDR4 ECC REG 24 x 16TB SAS HDD (RAID 6)
Management Node Supermicro SuperServer 1U 2 x Intel Xeon Silver 4310 (12 cores/24 threads each) 64 GB DDR4 ECC REG 2 x 1TB NVMe SSD (RAID 1) N/A

The network infrastructure utilizes a 100GbE backbone for high-speed data transfer between nodes. A dedicated InfiniBand network is also available for particularly demanding workloads, offering lower latency. See Network Topology for detailed diagrams.

Software Stack

The servers operate under a Linux distribution, specifically Ubuntu Server 22.04 LTS. This provides a stable and well-supported base for the AI software stack. Containerization using Docker and orchestration with Kubernetes are employed to manage application deployments and scalability.

Software Component Version Purpose
Operating System Ubuntu Server 22.04 LTS Base operating system
Container Runtime Docker 24.0.5 Containerization platform
Orchestration Kubernetes 1.27 Container orchestration
Machine Learning Frameworks TensorFlow 2.12, PyTorch 2.0, Scikit-learn 1.2 Core AI/ML libraries
Data Storage Ceph Octopus Distributed storage system

Access to the cluster is managed through SSH and a web-based dashboard built using Flask. Version control is handled via Git and hosted on a private GitLab instance. All code is subject to rigorous Code Review processes.

Networking Configuration

The network is segmented into several VLANs to enhance security and isolate different workloads. The main VLANs are:

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️