Server rental store

Data ingestion

# Data Ingestion: Server Configuration

This article details the server configuration required for efficient data ingestion into our MediaWiki 1.40 environment. Proper configuration is crucial for maintaining performance and data integrity. This guide is intended for newcomers to the server administration team. It covers hardware specifications, software prerequisites, and key configuration parameters.

Understanding the Data Ingestion Pipeline

Our data ingestion pipeline handles various data sources, including database dumps, API feeds, and direct file uploads. The process broadly consists of three stages: receiving the data, transforming it into a suitable format for MediaWiki, and loading it into the database. Each stage relies on specific server resources and software components. Special:MyLanguage/Help:Contents provides general guidance on MediaWiki operation.

Hardware Specifications

The data ingestion server requires robust hardware to handle large datasets efficiently. The following table outlines the recommended specifications:

Component Specification
CPU Intel Xeon Gold 6248R (24 cores) or equivalent AMD EPYC processor
RAM 128 GB DDR4 ECC Registered RAM
Storage (OS) 500 GB NVMe SSD
Storage (Data) 4 TB RAID 10 SSD array
Network Interface 10 Gigabit Ethernet
Power Supply Redundant 800W Power Supplies

These specifications are a baseline and may need adjustment based on the volume and velocity of ingested data. See Special:MyLanguage/Manual:Configuration settings for more details on server requirements.

Software Prerequisites

Several software packages are essential for the data ingestion process. These include:

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️