Monitoring System

From Server rental store
Revision as of 16:57, 15 April 2025 by Admin (talk | contribs) (Automated server configuration article)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
  1. Monitoring System

This article details the configuration and operation of the monitoring system used to ensure the stability and performance of our MediaWiki installation. It is intended as a guide for new system administrators and those seeking a deeper understanding of our infrastructure. This system is critical for proactive identification and resolution of issues, minimizing downtime and maintaining a positive user experience.

Overview

Our monitoring system consists of several interconnected components working together to provide a comprehensive view of server health. These components collect data, analyze it, and alert administrators when predefined thresholds are breached. The core components include: Nagios Core, a suite of plugins for data collection, and a custom alerting script via email. We also leverage system logs aggregated by rsyslog for detailed analysis. Effective monitoring is essential for maintaining a stable and performant MediaWiki installation.

System Architecture

The monitoring system is designed with redundancy in mind. While a single Nagios Core instance is currently active, a passive failover system is planned utilizing DRBD for data replication. Data collection is distributed across all servers, minimizing the load on any single machine. The system utilizes a "check-then-act" approach. First, a check is performed, then, if a threshold is exceeded, an alert is generated. This prevents unnecessary alerts and focuses attention on genuine problems. Proper firewall configuration is crucial to allow monitoring traffic.

Components

The key components are described below:

  • Nagios Core: The central monitoring engine. It schedules checks, processes results, and manages alerts.
  • Nagios Plugins: Scripts that perform specific checks (CPU usage, disk space, service status, etc.). These are often written in Bash scripting or Perl.
  • Alerting Script: A custom script that receives notifications from Nagios and formats them into email messages. This script is critical for ensuring timely alerts.
  • rsyslog: Collects and forwards system logs to a central log server for analysis. This is integrated with Nagios for correlation of events.
  • 'Network Monitoring (via Ping): Basic network connectivity checks using ping.

Server Specifications

The following table details the specifications of the dedicated monitoring server.

Hostname CPU Memory Disk Space Operating System
monitoring.example.com Intel Xeon E3-1220 v3 16 GB DDR3 500 GB SSD CentOS 7

The application servers also contribute to monitoring by running Nagios plugins. Their specifications are as follows:

Hostname CPU Memory Disk Space Operating System
wiki1.example.com Intel Xeon E5-2680 v4 64 GB DDR4 1 TB SSD CentOS 7
wiki2.example.com Intel Xeon E5-2680 v4 64 GB DDR4 1 TB SSD CentOS 7

The database server also has monitoring agents running:

Hostname CPU Memory Disk Space Operating System
db.example.com Intel Xeon E5-2690 v4 128 GB DDR4 2 TB SSD (RAID 1) CentOS 7

Monitored Metrics

We monitor a wide range of metrics to ensure comprehensive coverage. These can be categorized as follows:

  • System Resources: CPU usage, memory usage, disk space utilization, network bandwidth. These are monitored on all servers. See System administration for details.
  • Service Status: Status of critical services, such as the Apache web server, MySQL database, and Memcached.
  • Application Specific Metrics: These are monitored using custom Nagios plugins and include:
   *  Number of active users (via MySQL query analysis)
   *  Average page load time
   *  Database query performance
   *  Cache hit ratio
  • Log File Monitoring: Monitoring of key log files for error messages and warnings. This is done through logrotate and grep filtering.

Configuration Details

Nagios configuration files are located in `/etc/nagios/`. Key files include:

  • commands.cfg: Defines the commands to execute for each check.
  • hosts.cfg: Defines the hosts to be monitored.
  • services.cfg: Defines the services to be monitored on each host.
  • resources.cfg: Defines global resources and settings.

Changes to these files require careful planning and testing to avoid disrupting the monitoring system. Always back up configuration files before making changes. See Nagios documentation for more details on configuration.

Alerting System

The alerting system is configured to send email notifications to a dedicated support group when critical thresholds are breached. The alerting script includes the following information:

  • Hostname
  • Service Name
  • Problem Description
  • Current Status
  • Timestamp

Email alerts are categorized by severity level (Critical, Warning, Unknown) to prioritize responses. Alert escalation policies are defined in the Incident Management documentation.

Future Enhancements

Future enhancements to the monitoring system include:

  • Integration with a centralized logging platform (ELK Stack): This will provide more advanced log analysis and visualization capabilities.
  • Automated remediation: Implementing automated actions to resolve common issues.
  • Dashboarding: Creating a graphical dashboard to visualize key metrics.
  • Predictive Analysis: Utilizing machine learning to predict potential issues before they occur.



Special:Search/Nagios Special:Search/rsyslog Special:Search/MySQL Special:Search/Apache Special:Search/Memcached Special:Search/Bash scripting Special:Search/Perl Special:Search/Firewall Special:Search/DRBD Special:Search/Logrotate Special:Search/grep Special:Search/System administration Special:Search/Incident Management Special:Search/Nagios documentation Help:Tables Help:Linking


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️