Server rental store

Monitoring System

# Monitoring System

This article details the configuration and operation of the monitoring system used to ensure the stability and performance of our MediaWiki installation. It is intended as a guide for new system administrators and those seeking a deeper understanding of our infrastructure. This system is critical for proactive identification and resolution of issues, minimizing downtime and maintaining a positive user experience.

Overview

Our monitoring system consists of several interconnected components working together to provide a comprehensive view of server health. These components collect data, analyze it, and alert administrators when predefined thresholds are breached. The core components include: Nagios Core, a suite of plugins for data collection, and a custom alerting script via email. We also leverage system logs aggregated by rsyslog for detailed analysis. Effective monitoring is essential for maintaining a stable and performant MediaWiki installation.

System Architecture

The monitoring system is designed with redundancy in mind. While a single Nagios Core instance is currently active, a passive failover system is planned utilizing DRBD for data replication. Data collection is distributed across all servers, minimizing the load on any single machine. The system utilizes a "check-then-act" approach. First, a check is performed, then, if a threshold is exceeded, an alert is generated. This prevents unnecessary alerts and focuses attention on genuine problems. Proper firewall configuration is crucial to allow monitoring traffic.

Components

The key components are described below:

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️