Cassandra
- Cassandra Server Configuration
This article details the configuration of a Cassandra server for use with a MediaWiki installation. Cassandra is a highly scalable, distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. This guide is targeted towards system administrators new to Cassandra.
Introduction to Cassandra
Cassandra is well-suited for MediaWiki environments experiencing high traffic and needing to store large volumes of revision history, logs, and other data. Unlike traditional relational databases, Cassandra excels at write operations and can scale horizontally to accommodate growing data needs. It uses a decentralized architecture, meaning data is replicated across multiple nodes, ensuring durability and fault tolerance. Understanding the core concepts of Cassandra – nodes, keyspaces, column families (now called tables), and replication – is crucial for effective configuration. See Data Storage for more details on MediaWiki's data storage options.
Hardware Requirements
The hardware requirements for a Cassandra cluster depend heavily on the expected workload. However, the following provides a general guideline.
Component | Minimum Specification | Recommended Specification |
---|---|---|
CPU | 2 cores | 4+ cores |
RAM | 4 GB | 8+ GB |
Disk | 100 GB SSD | 500 GB+ SSD (RAID recommended) |
Network | 1 Gbps | 10 Gbps |
It's important to note that disk I/O is a critical performance factor for Cassandra. Solid State Drives (SSDs) are *highly* recommended. Consider using RAID configurations for redundancy and improved performance. Refer to Server Hardware Considerations for more information on hardware choices.
Software Requirements
- Operating System: Linux (CentOS, Ubuntu, Debian recommended)
- Java Development Kit (JDK): OpenJDK 8 or 11 are generally recommended. Ensure the `JAVA_HOME` environment variable is correctly set. See System Requirements for MediaWiki.
- Cassandra: Version 3.11 or later is recommended for compatibility.
- Python: Required for some Cassandra management tools.
- `cqlsh`: Cassandra Query Language Shell – used for interacting with the database.
Installation and Configuration
The installation process varies depending on your Linux distribution. Here's a general outline using `apt` (Debian/Ubuntu):
1. Add the Cassandra repository to your system's package manager. 2. Install Cassandra using `apt-get install cassandra`. 3. Start the Cassandra service using `systemctl start cassandra`. 4. Verify the service status using `systemctl status cassandra`.
The primary Cassandra configuration file is `cassandra.yaml`, located in `/etc/cassandra/`. Several parameters require careful consideration.
Core Configuration Parameters
Parameter | Description | Default Value | Recommended Value (Example) |
---|---|---|---|
`cluster_name` | A unique name for your Cassandra cluster. | 'Test Cluster' | 'MediaWikiCluster' |
`listen_address` | The IP address Cassandra listens on for client connections. | localhost | Server's Public IP Address |
`rpc_address` | The IP address Cassandra listens on for Thrift client connections. | localhost | Server's Public IP Address |
`seed_provider` | A list of seed nodes for the cluster. | `class_name: org.apache.cassandra.locator.SimpleSeedProvider\nparameters:\n - seeds: "127.0.0.1"` | `class_name: org.apache.cassandra.locator.SimpleSeedProvider\nparameters:\n - seeds: "node1_ip,node2_ip,node3_ip"` |
`data_file_directories` | Directories where Cassandra stores data. | `/var/lib/cassandra/data` | `/mnt/ssd1/cassandra_data` (if using a separate SSD) |
Adjust these parameters according to your environment. Always restart the Cassandra service after making changes to `cassandra.yaml`. Consult the Configuration Files article for a detailed explanation of all available settings.
Keyspace Creation
Once Cassandra is running, you need to create a keyspace to store your MediaWiki data. Use `cqlsh` to connect to your Cassandra instance:
```bash cqlsh <your_cassandra_ip> ```
Then, execute the following CQL command:
```cql CREATE KEYSPACE mediawiki_keyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }; ```
This creates a keyspace named `mediawiki_keyspace` with a replication factor of 3 in the `datacenter1` datacenter. Adjust the replication factor and datacenter name according to your cluster topology. See Database Schemas for information on extending schemas.
Monitoring and Maintenance
Regular monitoring is crucial for maintaining a healthy Cassandra cluster. Tools like `nodetool` provide valuable insights into node status, data distribution, and performance metrics.
Command | Description |
---|---|
`nodetool status` | Displays the status of each node in the cluster. |
`nodetool compactionstats` | Shows the status of ongoing compaction processes. |
`nodetool cfstats` | Provides statistics for column families (tables). |
`nodetool repair` | Repairs data inconsistencies across replicas. |
Regularly run `nodetool repair` to ensure data consistency. Consider implementing a monitoring solution like Prometheus or Grafana for long-term performance tracking. Refer to Server Monitoring Best Practices for more details.
Integration with MediaWiki
Integrating Cassandra with MediaWiki requires specific extensions and configuration changes within MediaWiki itself. This is beyond the scope of this introductory article but generally involves configuring MediaWiki to use a Cassandra-based data storage backend. See Extending MediaWiki for information on available extensions.
Special:MyPreferences Help:Contents Manual:Configuration Manual:Database Manual:Installation Help:Table Syntax
Intel-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Core i7-6700K/7700 Server | 64 GB DDR4, NVMe SSD 2 x 512 GB | CPU Benchmark: 8046 |
Core i7-8700 Server | 64 GB DDR4, NVMe SSD 2x1 TB | CPU Benchmark: 13124 |
Core i9-9900K Server | 128 GB DDR4, NVMe SSD 2 x 1 TB | CPU Benchmark: 49969 |
Core i9-13900 Server (64GB) | 64 GB RAM, 2x2 TB NVMe SSD | |
Core i9-13900 Server (128GB) | 128 GB RAM, 2x2 TB NVMe SSD | |
Core i5-13500 Server (64GB) | 64 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Server (128GB) | 128 GB RAM, 2x500 GB NVMe SSD | |
Core i5-13500 Workstation | 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000 |
AMD-Based Server Configurations
Configuration | Specifications | Benchmark |
---|---|---|
Ryzen 5 3600 Server | 64 GB RAM, 2x480 GB NVMe | CPU Benchmark: 17849 |
Ryzen 7 7700 Server | 64 GB DDR5 RAM, 2x1 TB NVMe | CPU Benchmark: 35224 |
Ryzen 9 5950X Server | 128 GB RAM, 2x4 TB NVMe | CPU Benchmark: 46045 |
Ryzen 9 7950X Server | 128 GB DDR5 ECC, 2x2 TB NVMe | CPU Benchmark: 63561 |
EPYC 7502P Server (128GB/1TB) | 128 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/2TB) | 128 GB RAM, 2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (128GB/4TB) | 128 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/1TB) | 256 GB RAM, 1 TB NVMe | CPU Benchmark: 48021 |
EPYC 7502P Server (256GB/4TB) | 256 GB RAM, 2x2 TB NVMe | CPU Benchmark: 48021 |
EPYC 9454P Server | 256 GB RAM, 2x2 TB NVMe |
Order Your Dedicated Server
Configure and order your ideal server configuration
Need Assistance?
- Telegram: @powervps Servers at a discounted price
⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️