Monitoring System

# Monitoring System

This article details the configuration and operation of the monitoring system used to ensure the stability and performance of our MediaWiki installation. It is intended as a guide for new system administrators and those seeking a deeper understanding of our infrastructure. This system is critical for proactive identification and resolution of issues, minimizing downtime and maintaining a positive user experience.

Overview

Our monitoring system consists of several interconnected components working together to provide a comprehensive view of server health. These components collect data, analyze it, and alert administrators when predefined thresholds are breached. The core components include: Nagios Core, a suite of plugins for data collection, and a custom alerting script via email. We also leverage system logs aggregated by rsyslog for detailed analysis. Effective monitoring is essential for maintaining a stable and performant MediaWiki installation.

System Architecture

The monitoring system is designed with redundancy in mind. While a single Nagios Core instance is currently active, a passive failover system is planned utilizing DRBD for data replication. Data collection is distributed across all servers, minimizing the load on any single machine. The system utilizes a "check-then-act" approach. First, a check is performed, then, if a threshold is exceeded, an alert is generated. This prevents unnecessary alerts and focuses attention on genuine problems. Proper firewall configuration is crucial to allow monitoring traffic.

Components

The key components are described below:

Nagios Core: The central monitoring engine. It schedules checks, processes results, and manages alerts.
Nagios Plugins: Scripts that perform specific checks (CPU usage, disk space, service status, etc.). These are often written in Bash scripting or Perl.
Alerting Script: A custom script that receives notifications from Nagios and formats them into email messages. This script is critical for ensuring timely alerts.
rsyslog: Collects and forwards system logs to a central log server for analysis. This is integrated with Nagios for correlation of events.
'Network Monitoring (via Ping): Basic network connectivity checks using ping.

Server Specifications

The following table details the specifications of the dedicated monitoring server.

Hostname	CPU	Memory	Disk Space	Operating System
monitoring.example.com	Intel Xeon E3-1220 v3	16 GB DDR3	500 GB SSD	CentOS 7

The application servers also contribute to monitoring by running Nagios plugins. Their specifications are as follows:

Hostname	CPU	Memory	Disk Space	Operating System
wiki1.example.com	Intel Xeon E5-2680 v4	64 GB DDR4	1 TB SSD	CentOS 7
wiki2.example.com	Intel Xeon E5-2680 v4	64 GB DDR4	1 TB SSD	CentOS 7

The database server also has monitoring agents running:

Hostname	CPU	Memory	Disk Space	Operating System
db.example.com	Intel Xeon E5-2690 v4	128 GB DDR4	2 TB SSD (RAID 1)	CentOS 7

Monitored Metrics

We monitor a wide range of metrics to ensure comprehensive coverage. These can be categorized as follows:

System Resources: CPU usage, memory usage, disk space utilization, network bandwidth. These are monitored on all servers. See System administration for details.
Service Status: Status of critical services, such as the Apache web server, MySQL database, and Memcached.
Application Specific Metrics: These are monitored using custom Nagios plugins and include:

MySQL query

Log File Monitoring: Monitoring of key log files for error messages and warnings. This is done through logrotate and grep filtering.

Configuration Details

Nagios configuration files are located in `/etc/nagios/`. Key files include:

commands.cfg: Defines the commands to execute for each check.
hosts.cfg: Defines the hosts to be monitored.
services.cfg: Defines the services to be monitored on each host.
resources.cfg: Defines global resources and settings.

Changes to these files require careful planning and testing to avoid disrupting the monitoring system. Always back up configuration files before making changes. See Nagios documentation for more details on configuration.

Alerting System

The alerting system is configured to send email notifications to a dedicated support group when critical thresholds are breached. The alerting script includes the following information:

Hostname
Service Name
Problem Description
Current Status
Timestamp

Email alerts are categorized by severity level (Critical, Warning, Unknown) to prioritize responses. Alert escalation policies are defined in the Incident Management documentation.

Future Enhancements

Future enhancements to the monitoring system include:

Integration with a centralized logging platform (ELK Stack): This will provide more advanced log analysis and visualization capabilities.
Automated remediation: Implementing automated actions to resolve common issues.
Dashboarding: Creating a graphical dashboard to visualize key metrics.
Predictive Analysis: Utilizing machine learning to predict potential issues before they occur.

Special:Search/Nagios Special:Search/rsyslog Special:Search/MySQL Special:Search/Apache Special:Search/Memcached Special:Search/Bash scripting Special:Search/Perl Special:Search/Firewall Special:Search/DRBD Special:Search/Logrotate Special:Search/grep Special:Search/System administration Special:Search/Incident Management Special:Search/Nagios documentation Help:Tables Help:Linking

Category:Server Hardware

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️