Monitoring System
- Monitoring System
This article details the configuration and operation of the monitoring system used to ensure the stability and performance of our MediaWiki installation. It is intended as a guide for new system administrators and those seeking a deeper understanding of our infrastructure. This system is critical for proactive identification and resolution of issues, minimizing downtime and maintaining a positive user experience.
Overview
Our monitoring system consists of several interconnected components working together to provide a comprehensive view of server health. These components collect data, analyze it, and alert administrators when predefined thresholds are breached. The core components include: Nagios Core, a suite of plugins for data collection, and a custom alerting script via email. We also leverage system logs aggregated by rsyslog for detailed analysis. Effective monitoring is essential for maintaining a stable and performant MediaWiki installation.
System Architecture
The monitoring system is designed with redundancy in mind. While a single Nagios Core instance is currently active, a passive failover system is planned utilizing DRBD for data replication. Data collection is distributed across all servers, minimizing the load on any single machine. The system utilizes a "check-then-act" approach. First, a check is performed, then, if a threshold is exceeded, an alert is generated. This prevents unnecessary alerts and focuses attention on genuine problems. Proper firewall configuration is crucial to allow monitoring traffic.
Components
The key components are described below:
- Nagios Core: The central monitoring engine. It schedules checks, processes results, and manages alerts.
- Nagios Plugins: Scripts that perform specific checks (CPU usage, disk space, service status, etc.). These are often written in Bash scripting or Perl.
- Alerting Script: A custom script that receives notifications from Nagios and formats them into email messages. This script is critical for ensuring timely alerts.
- rsyslog: Collects and forwards system logs to a central log server for analysis. This is integrated with Nagios for correlation of events.
- 'Network Monitoring (via Ping): Basic network connectivity checks using ping.
Server Specifications
The following table details the specifications of the dedicated monitoring server.
Hostname | CPU | Memory | Disk Space | Operating System |
---|---|---|---|---|
monitoring.example.com | Intel Xeon E3-1220 v3 | 16 GB DDR3 | 500 GB SSD | CentOS 7 |
The application servers also contribute to monitoring by running Nagios plugins. Their specifications are as follows:
Hostname | CPU | Memory | Disk Space | Operating System |
---|---|---|---|---|
wiki1.example.com | Intel Xeon E5-2680 v4 | 64 GB DDR4 | 1 TB SSD | CentOS 7 |
wiki2.example.com | Intel Xeon E5-2680 v4 | 64 GB DDR4 | 1 TB SSD | CentOS 7 |
The database server also has monitoring agents running:
Hostname | CPU | Memory | Disk Space | Operating System |
---|---|---|---|---|
db.example.com | Intel Xeon E5-2690 v4 | 128 GB DDR4 | 2 TB SSD (RAID 1) | CentOS 7 |
Monitored Metrics
We monitor a wide range of metrics to ensure comprehensive coverage. These can be categorized as follows:
- System Resources: CPU usage, memory usage, disk space utilization, network bandwidth. These are monitored on all servers. See System administration for details.
- Service Status: Status of critical services, such as the Apache web server, MySQL database, and Memcached.
- Application Specific Metrics: These are monitored using custom Nagios plugins and include:
* Number of active users (via MySQL query analysis) * Average page load time * Database query performance * Cache hit ratio
- Log File Monitoring: Monitoring of key log files for error messages and warnings. This is done through logrotate and grep filtering.
Configuration Details
Nagios configuration files are located in `/etc/nagios/`. Key files include:
- commands.cfg: Defines the commands to execute for each check.
- hosts.cfg: Defines the hosts to be monitored.
- services.cfg: Defines the services to be monitored on each host.
- resources.cfg: Defines global resources and settings.
Changes to these files require careful planning and testing to avoid disrupting the monitoring system. Always back up configuration files before making changes. See Nagios documentation for more details on configuration.
Alerting System
The alerting system is configured to send email notifications to a dedicated support group when critical thresholds are breached. The alerting script includes the following information:
- Hostname
- Service Name
- Problem Description
- Current Status
- Timestamp
Email alerts are categorized by severity level (Critical, Warning, Unknown) to prioritize responses. Alert escalation policies are defined in the Incident Management documentation.
Future Enhancements
Future enhancements to the monitoring system include:
- Integration with a centralized logging platform (ELK Stack): This will provide more advanced log analysis and visualization capabilities.
- Automated remediation: Implementing automated actions to resolve common issues.
- Dashboarding: Creating a graphical dashboard to visualize key metrics.
- Predictive Analysis: Utilizing machine learning to predict potential issues before they occur.
Special:Search/Nagios
Special:Search/rsyslog
Special:Search/MySQL
Special:Search/Apache
Special:Search/Memcached
Special:Search/Bash scripting
Special:Search/Perl
Special:Search/Firewall
Special:Search/DRBD
Special:Search/Logrotate
Special:Search/grep
Special:Search/System administration
Special:Search/Incident Management
Special:Search/Nagios documentation
Help:Tables
Help:Linking
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️