Cassandra

From Server rental store
Jump to navigation Jump to search
  1. Cassandra Server Configuration

This article details the configuration of a Cassandra server for use with a MediaWiki installation. Cassandra is a highly scalable, distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. This guide is targeted towards system administrators new to Cassandra.

Introduction to Cassandra

Cassandra is well-suited for MediaWiki environments experiencing high traffic and needing to store large volumes of revision history, logs, and other data. Unlike traditional relational databases, Cassandra excels at write operations and can scale horizontally to accommodate growing data needs. It uses a decentralized architecture, meaning data is replicated across multiple nodes, ensuring durability and fault tolerance. Understanding the core concepts of Cassandra – nodes, keyspaces, column families (now called tables), and replication – is crucial for effective configuration. See Data Storage for more details on MediaWiki's data storage options.

Hardware Requirements

The hardware requirements for a Cassandra cluster depend heavily on the expected workload. However, the following provides a general guideline.

Component Minimum Specification Recommended Specification
CPU 2 cores 4+ cores
RAM 4 GB 8+ GB
Disk 100 GB SSD 500 GB+ SSD (RAID recommended)
Network 1 Gbps 10 Gbps

It's important to note that disk I/O is a critical performance factor for Cassandra. Solid State Drives (SSDs) are *highly* recommended. Consider using RAID configurations for redundancy and improved performance. Refer to Server Hardware Considerations for more information on hardware choices.

Software Requirements

  • Operating System: Linux (CentOS, Ubuntu, Debian recommended)
  • Java Development Kit (JDK): OpenJDK 8 or 11 are generally recommended. Ensure the `JAVA_HOME` environment variable is correctly set. See System Requirements for MediaWiki.
  • Cassandra: Version 3.11 or later is recommended for compatibility.
  • Python: Required for some Cassandra management tools.
  • `cqlsh`: Cassandra Query Language Shell – used for interacting with the database.

Installation and Configuration

The installation process varies depending on your Linux distribution. Here's a general outline using `apt` (Debian/Ubuntu):

1. Add the Cassandra repository to your system's package manager. 2. Install Cassandra using `apt-get install cassandra`. 3. Start the Cassandra service using `systemctl start cassandra`. 4. Verify the service status using `systemctl status cassandra`.

The primary Cassandra configuration file is `cassandra.yaml`, located in `/etc/cassandra/`. Several parameters require careful consideration.

Core Configuration Parameters

Parameter Description Default Value Recommended Value (Example)
`cluster_name` A unique name for your Cassandra cluster. 'Test Cluster' 'MediaWikiCluster'
`listen_address` The IP address Cassandra listens on for client connections. localhost Server's Public IP Address
`rpc_address` The IP address Cassandra listens on for Thrift client connections. localhost Server's Public IP Address
`seed_provider` A list of seed nodes for the cluster. `class_name: org.apache.cassandra.locator.SimpleSeedProvider\nparameters:\n - seeds: "127.0.0.1"` `class_name: org.apache.cassandra.locator.SimpleSeedProvider\nparameters:\n - seeds: "node1_ip,node2_ip,node3_ip"`
`data_file_directories` Directories where Cassandra stores data. `/var/lib/cassandra/data` `/mnt/ssd1/cassandra_data` (if using a separate SSD)

Adjust these parameters according to your environment. Always restart the Cassandra service after making changes to `cassandra.yaml`. Consult the Configuration Files article for a detailed explanation of all available settings.

Keyspace Creation

Once Cassandra is running, you need to create a keyspace to store your MediaWiki data. Use `cqlsh` to connect to your Cassandra instance:

```bash cqlsh <your_cassandra_ip> ```

Then, execute the following CQL command:

```cql CREATE KEYSPACE mediawiki_keyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }; ```

This creates a keyspace named `mediawiki_keyspace` with a replication factor of 3 in the `datacenter1` datacenter. Adjust the replication factor and datacenter name according to your cluster topology. See Database Schemas for information on extending schemas.

Monitoring and Maintenance

Regular monitoring is crucial for maintaining a healthy Cassandra cluster. Tools like `nodetool` provide valuable insights into node status, data distribution, and performance metrics.

Command Description
`nodetool status` Displays the status of each node in the cluster.
`nodetool compactionstats` Shows the status of ongoing compaction processes.
`nodetool cfstats` Provides statistics for column families (tables).
`nodetool repair` Repairs data inconsistencies across replicas.

Regularly run `nodetool repair` to ensure data consistency. Consider implementing a monitoring solution like Prometheus or Grafana for long-term performance tracking. Refer to Server Monitoring Best Practices for more details.

Integration with MediaWiki

Integrating Cassandra with MediaWiki requires specific extensions and configuration changes within MediaWiki itself. This is beyond the scope of this introductory article but generally involves configuring MediaWiki to use a Cassandra-based data storage backend. See Extending MediaWiki for information on available extensions.

Special:MyPreferences Help:Contents Manual:Configuration Manual:Database Manual:Installation Help:Table Syntax


Intel-Based Server Configurations

Configuration Specifications Benchmark
Core i7-6700K/7700 Server 64 GB DDR4, NVMe SSD 2 x 512 GB CPU Benchmark: 8046
Core i7-8700 Server 64 GB DDR4, NVMe SSD 2x1 TB CPU Benchmark: 13124
Core i9-9900K Server 128 GB DDR4, NVMe SSD 2 x 1 TB CPU Benchmark: 49969
Core i9-13900 Server (64GB) 64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB) 128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB) 64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB) 128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation 64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration Specifications Benchmark
Ryzen 5 3600 Server 64 GB RAM, 2x480 GB NVMe CPU Benchmark: 17849
Ryzen 7 7700 Server 64 GB DDR5 RAM, 2x1 TB NVMe CPU Benchmark: 35224
Ryzen 9 5950X Server 128 GB RAM, 2x4 TB NVMe CPU Benchmark: 46045
Ryzen 9 7950X Server 128 GB DDR5 ECC, 2x2 TB NVMe CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB) 128 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB) 128 GB RAM, 2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB) 128 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB) 256 GB RAM, 1 TB NVMe CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB) 256 GB RAM, 2x2 TB NVMe CPU Benchmark: 48021
EPYC 9454P Server 256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️