Server rental store

AI in the North America Rainforest

AI in the North America Rainforest: Server Configuration

This article details the server configuration used to support the “AI in the North America Rainforest” project. This project utilizes machine learning algorithms to analyze data collected from sensor networks deployed within several rainforest ecosystems across North America. This document is intended for new engineers onboarding to the project and assumes a basic understanding of Linux server administration and networking.

Project Overview

The “AI in the North America Rainforest” project aims to predict and mitigate the effects of climate change on these fragile ecosystems. We collect data on temperature, humidity, soil moisture, animal activity, and plant health. This data is processed using a combination of edge computing and centralized server infrastructure. The data pipeline involves real-time analysis at the sensor level, followed by more complex modeling and analysis on our central servers. Data Pipeline Overview provides a more detailed explanation of the whole process.

Server Infrastructure

Our central server infrastructure is hosted in a secure data center in Oregon. It consists of a cluster of high-performance servers, utilizing both physical and virtualized instances. We employ a hybrid cloud architecture, leveraging both on-premise hardware and cloud resources for scalability and redundancy. Hybrid Cloud Architecture details the benefits of this design. The server cluster is managed using Kubernetes for container orchestration. Kubernetes Documentation is a valuable resource for learning more about this.

Physical Server Specifications

The core of our processing power resides in a cluster of eight physical servers. Their specifications are detailed below:

CPU Memory Storage Network Interface
2 x Intel Xeon Gold 6248R (24 cores each) 512 GB DDR4 ECC Registered RAM 16 TB NVMe SSD (RAID 10) 100 Gbps Ethernet
2 x AMD EPYC 7763 (64 cores each) 1 TB DDR4 ECC Registered RAM 32 TB NVMe SSD (RAID 10) 200 Gbps Ethernet

These servers are running Ubuntu Server 22.04 LTS. We utilize a custom kernel optimized for machine learning workloads. Kernel Optimization Guide provides details on the kernel configuration.

Virtual Machine Specifications

In addition to the physical servers, we also utilize virtual machines for less demanding tasks, such as data ingestion and preprocessing. These VMs are hosted on a VMware vSphere cluster.

VM Name vCPUs Memory Storage Operating System
Data Ingestion 1 8 64 GB 2 TB SSD Ubuntu Server 22.04 LTS
Data Preprocessing 1 16 128 GB 4 TB SSD Ubuntu Server 22.04 LTS
Database Server 1 16 256 GB 8 TB SSD CentOS 7

These VMs are regularly backed up using Veeam Backup & Replication. Veeam Documentation is available for backup procedures.

Database Server Details

The primary database server is a dedicated VM running PostgreSQL 14. It stores all the collected sensor data, metadata, and model outputs.

Parameter Value
Database Engine PostgreSQL 14 Maximum Connections 500 WAL Level Replica Shared Buffers 128GB

The database is regularly monitored using Prometheus and Grafana. Prometheus Monitoring Guide will help with setting up monitoring. Regular database maintenance, including vacuuming and analyzing, is performed weekly. PostgreSQL Maintenance Guide provides detailed instructions.

Software Stack

The software stack used for the project includes:

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️