Technical Documentation: High-Performance MQTT Broker Server Configuration

This document details the optimized hardware configuration designed specifically to host enterprise-level MQTT brokers. The primary goal of this specification is to maximize concurrent connections, minimize end-to-end latency, and ensure data integrity under extreme publish/subscribe load, typical of large-scale IoT deployments and high-frequency trading signal distribution.

1. Hardware Specifications

The selected platform is a dual-socket, high-core-count server architecture, prioritizing high PCIe bandwidth and NVMe storage speed over sheer raw CPU clock speed, as MQTT broker performance is often bottlenecked by network I/O and persistent storage synchronization.

1.1 Base Platform and CPU Selection

The foundation relies on server platforms supporting PCIe 5.0 lanes to accommodate high-speed networking and NVMe storage arrays.

Core Platform Specifications
Component	Specification Detail	Rationale
Server Chassis	2U Rackmount, High Airflow (e.g., Dell PowerEdge R760/HPE ProLiant DL380 Gen11 equivalent)	Density and standardized cooling infrastructure.
Processor (x2)	Intel Xeon Scalable (Sapphire Rapids) or AMD EPYC (Genoa/Bergamo)	Focus on high core count (minimum 48 effective cores per CPU) and extensive L3 cache to minimize context switching overhead for connection management.
CPU Specification Target	Minimum 48 Cores / 96 Threads per socket (Total 96C/192T)	Essential for managing hundreds of thousands of concurrent TCP connections and TLS handshakes.	Base Clock Speed	2.5 GHz minimum (Targeting high boost clocks under sustained load)	Sufficient frequency for cryptographic operations and application logic.
Chipset / Platform Support	C741 (Intel) or SP5 (AMD)	Required for PCIe 5.0 support and maximum memory channel utilization.
BIOS Tuning	Disable Hyper-Threading (Intel) or SMT (AMD) if using Bergamo variants focused purely on core count; Enable NUMA awareness and memory interleaving.	Performance tuning for predictable latency; SMT/HT can sometimes introduce jitter in high-concurrency scenarios.

1.2 Memory (RAM) Configuration

MQTT brokers, especially those utilizing persistent session storage (clean session = false), require substantial memory for session state caching and network buffer management. We prioritize high-speed, low-latency memory modules.

Memory Configuration
Parameter	Specification	Notes
Total Capacity	1024 GB (1 TB) DDR5 ECC Registered	Provides ample headroom for OS, broker processes, and session state caching.		Speed / Rank	4800 MT/s minimum (DDR5-4800)	Maximizing memory bandwidth is crucial for fast state retrieval.		Configuration Topology	Fully populated across all memory channels (12 or 16 DIMMs per socket)	Ensures optimal Non-Uniform Memory Access performance and channel utilization.
Memory Type	Load-Reduced (LRDIMM) only if capacity exceeds 2TB; otherwise RDIMM/UDIMM preferred for latency.	Aiming for RDIMMs (Registered DIMMs) for stability at high speeds.

1.3 Storage Subsystem (I/O Critical)

The storage subsystem is arguably the most critical component for durability and QoS levels 1 and 2. It must handle rapid, small, random writes that characterize message persistence logging.

Storage Subsystem Design
Component	Specification	Function
Operating System/Boot Drive	2x 480GB SATA SSD (RAID 1)	Standard mirrored configuration for OS stability.	Broker Logs/State Storage (Primary)	4x 3.84TB Enterprise NVMe SSD (PCIe 5.0, U.2/M.2)	Dedicated high-speed storage for persistent session states, subscriptions, and retained messages.
RAID Configuration (Primary)	RAID 10 across the 4 NVMe drives	Provides excellent read/write performance and redundancy against single drive failure.	IOPS Target (Sustained Write)	Minimum 1,500,000 IOPS sustained (mixed 4K/8K blocks)	Necessary for handling high volumes of QoS 1/2 messages requiring immediate disk acknowledgment.
Optional Persistent Queue Storage	2x 7.68TB Enterprise SATA SSD (RAID 1)	For brokers using traditional file-based message queuing (less common in modern in-memory brokers, but required by some implementations).

1.4 Networking Interface

Low-latency, high-throughput networking is paramount for managing thousands of persistent TCP sockets.

Network Interface Card (NIC) Specifications
Interface	Specification	Role
Primary Data Plane	2x 50 GbE or 2x 100 GbE NIC (PCIe 5.0 Capable)	Handling client connections, message ingress/egress. Requires low-latency drivers (e.g., Solarflare/Mellanox ConnectX series).
Offloading Features	Support for TOE and Remote Direct Memory Access (if supported by broker software)	Reducing CPU overhead associated with socket management.
Management Interface	1x 1 GbE IPMI/BMC	Out-of-band management (e.g., Intelligent Platform Management Interface).

1.5 Power and Cooling

Due to the high density of high-performance CPUs and NVMe drives (which can generate significant heat), robust power and cooling infrastructure are mandatory.

Power and Cooling Requirements
Metric	Requirement	Note
Power Supply Units (PSUs)	2x 2000W (1+1 Redundant) Titanium/Platinum Efficiency	Ensures capacity for peak CPU/NVMe draw plus headroom for network card activity.		Power Draw (Peak)	Estimated 1200W – 1500W at full load	Requires high-density power distribution units (PDUs) in the rack.		Cooling Environment	Front-to-Back Airflow; Must support 30°C Ambient Inlet Temperature	Essential to prevent thermal throttling of high-clock-speed CPUs and NVMe drives.

2. Performance Characteristics

The performance of an MQTT broker is measured by its ability to handle concurrent connections ($C_{max}$), message throughput ($M_{throughput}$), and the end-to-end latency ($L_{e2e}$) for Quality of Service (QoS) 0 and QoS 1 messages.

2.1 Benchmarking Methodology

Testing utilizes industry-standard tools that simulate realistic IoT traffic patterns, such as the Eclipse Mosquitto performance testing suite or custom JMeter configurations adapted for the MQTT protocol. Tests focus on sustained load rather than peak bursts.

2.2 Connection Scalability

The hardware configuration is designed to push the limits of operating system socket handling and the broker application’s internal connection management structures.

Connection Scalability Results (Simulated Environment)
Metric	Target Value	Achieved Result (Typical Broker e.g., EMQX/HiveMQ)
Concurrent Connections ($C_{max}$)	800,000	1,150,000+		TLS Handshake Rate	15,000 Handshakes/second	18,500 Handshakes/second (Utilizing CPU crypto acceleration)
Keep-Alive Timeout Jitter	< 50 ms standard deviation	Critical for rapid detection of dead clients and resource reclamation.

The high core count and large L3 cache significantly reduce the overhead per TCP session, allowing the kernel to manage a larger ephemeral port range efficiently.

2.3 Message Throughput and Latency

Throughput is heavily dependent on the QoS level requested, as higher QoS levels introduce mandatory disk I/O synchronization.

QoS 0 Throughput (Best Effort)

In this scenario, the broker primarily relies on RAM and network I/O path optimization (TOE).

**Configuration Impact:** The 100GbE NICs and the high memory bandwidth (DDR5) allow for extremely fast buffering and forwarding.
**Sustained Throughput:** $\approx 1.2$ Million Messages Per Second (MPS) for small payloads (128 bytes).

QoS 1 Throughput (At Least Once)

This requires writing the message to the persistent log/state store before acknowledging receipt to the publisher. This is the primary test for the NVMe subsystem.

**Configuration Impact:** The RAID 10 NVMe array must sustain high write IOPS. The CPU core count manages the acknowledgment overhead.
**Sustained Throughput:** $\approx 350,000$ MPS (128 byte payload). Latency increases due to synchronous disk writes.

Latency Analysis

Latency is measured from publish-request arrival to acknowledgment/delivery complete.

End-to-End Latency (P99)
QoS Level	Payload Size (Bytes)	Sub-Millisecond Latency Target	Achieved P99 Latency
QoS 0 (One-Way Delivery)	128	< 100 µs	85 µs		QoS 1 (Round Trip Acknowledgment)	128	< 400 µs	310 µs		QoS 2 (Guaranteed Delivery)	128	< 800 µs	650 µs

The low latency achieved is directly attributable to the PCIe 5.0 infrastructure minimizing hardware latency between the CPU, NIC, and NVMe controller, overcoming the inherent latency introduced by the software protocol stack. TCP/IP Stack Optimization is a crucial software layer complementing this hardware performance.

2.4 CPU Utilization Under Load

Under maximum sustained load (1M connections, 300k MPS QoS 1), the CPU utilization typically remains in the 75%–85% range across all available threads. The remaining headroom is reserved for administrative tasks, monitoring agents, and ensuring system stability against sudden traffic spikes. Operating System Kernel Tuning (e.g., increasing file descriptor limits and adjusting TCP buffer sizes) is necessary to realize these utilization figures.

3. Recommended Use Cases

This high-specification configuration is overkill for small deployments (e.g., < 50,000 connections) but is essential for mission-critical, high-scale enterprise applications where downtime or message loss carries significant financial or operational risk.

3.1 Large-Scale Industrial IoT (IIoT)

**Requirement:** Managing tens of thousands of sensors reporting telemetry every few seconds, requiring high reliability (QoS 1/2) for critical state changes (e.g., valve positions, emergency stops).
**Hardware Fit:** The massive connection capacity allows a single broker cluster node to manage a significant regional deployment, reducing infrastructure sprawl. The high-speed NVMe ensures that stateful sessions are recovered rapidly following any temporary network interruption or broker restart. IIoT Message Broker Deployment often demands this level of resilience.

3.2 Telematics and Fleet Management

**Requirement:** Ingesting location updates, vehicle diagnostics, and command-and-control signals from millions of mobile assets. Latency for command delivery must be minimal.
**Hardware Fit:** The high MPS throughput supports the periodic, high-volume telemetry bursts common in fleet updates, while low latency ensures that remote control commands (e.g., remote engine kill) propagate near-instantaneously. Real-Time Location Services rely heavily on this performance profile.

3.3 Financial Ticker Distribution (Non-Trade Execution)

**Requirement:** Distributing market data snapshots, quotes, and low-volume administrative messages to numerous subscribed trading applications. QoS 0 is often acceptable for high-frequency data fans.
**Hardware Fit:** The near-sub-millisecond latency for QoS 0 messages makes this configuration suitable for disseminating market data feeds where speed is the primary concern, often leveraging UDP Multicast extensions where appropriate, though the core platform excels at TCP-based persistence.

3.4 High-Density Consumer/Mobile Backends

**Requirement:** Supporting millions of mobile application users requiring persistent chat or notification services (e.g., WhatsApp, Slack).
**Hardware Fit:** The ability to maintain over a million persistent TCP connections is the defining feature here. The large memory capacity caches subscription lists and user presence status, minimizing database lookups during message routing. This configuration supports significant Mobile Push Notification Gateways.

4. Comparison with Similar Configurations

To justify the investment in PCIe 5.0 and high-core count CPUs, it is necessary to compare this "High-Performance" (HP) configuration against two common alternatives: the "Mid-Range" (MR) configuration and the "Cost-Optimized" (CO) configuration.

4.1 Configuration Definitions

**HP (High-Performance):** The configuration detailed in Section 1 (Dual Socket, 96+ Cores, 1TB DDR5, NVMe RAID 10, 100GbE).
**MR (Mid-Range):** Single Socket EPYC/Xeon (32 Cores), 256GB DDR4, PCIe 4.0, 2x 25GbE, SATA SSD RAID 5.
**CO (Cost-Optimized):** Older Generation Dual Socket (24 Cores Total), 128GB DDR4, SATA SSD RAID 1, 1GbE.

4.2 Performance Comparison Table

Comparative Performance Metrics
Metric	CO Configuration	MR Configuration	HP Configuration (Target)
Max Concurrent Connections	~50,000	~250,000	> 1,000,000		Sustained QoS 1 MPS (128B)	~5,000	~45,000	> 300,000		P99 Latency (QoS 1 RTT)	> 4.0 ms	~800 µs	< 400 µs		Storage Performance Bottleneck	SATA SSD IOPS/Latency	SATA SSD IOPS/Latency	Network/CPU (Minimal Storage Bottleneck)		Relative Cost Index (1.0 = CO)	1.0	2.5	4.5

4.3 Architectural Takeaways

1. **Connection Density:** The HP configuration scales connections by a factor of 4x over MR, primarily due to the sheer number of cores and memory channels available to manage per-socket TCP stacks. 2. **I/O Dominance:** The transition from PCIe 4.0 (MR) to PCIe 5.0 (HP) combined with NVMe RAID 10 is the primary differentiator for throughput. The MR configuration quickly becomes storage-bound when attempting QoS 1 or 2 workloads exceeding 50,000 MPS. 3. **Latency Floor:** The HP configuration establishes a much lower latency floor, which is crucial for highly reactive systems. The CO configuration is fundamentally limited by the slow I/O path and older NIC technology. Latency Measurement Standards must be rigorously applied when comparing these tiers.

This analysis clearly demonstrates that the HP configuration is necessary when the application demands high reliability (QoS 1/2) at massive scale, as software overhead and I/O latency are the primary limiting factors that this hardware stack is designed to mitigate.

5. Maintenance Considerations

Deploying high-performance hardware requires a corresponding shift in operational procedures, particularly concerning monitoring, firmware management, and specialized maintenance.

5.1 Firmware and Driver Management

Maintaining the firmware stack is more complex but critical for performance stability.

**BIOS/UEFI:** Must be kept current to ensure optimal NUMA balancing and memory timing profiles, especially concerning DDR5 stability under heavy memory utilization.
**Storage Controllers:** NVMe RAID controller firmware updates are mandatory. Outdated firmware can lead to unexpected Write Amplification or poor garbage collection cycles, severely degrading sustained IOPS.
**NIC Drivers:** Utilize vendor-specific, low-latency drivers (e.g., mlx5 driver for Mellanox) rather than generic OS in-box drivers to ensure access to hardware offload features (TOE, RSS/RPS tuning). Driver Version Compatibility Matrix must be maintained.

5.2 Monitoring and Telemetry

Standard CPU and Memory monitoring is insufficient. Specialized monitoring must target the bottlenecks identified in Section 2.

**NVMe Health:** Monitor drive wear-leveling statistics (e.g., SMART data, vendor-specific health registers) closely. High write utilization means drives may reach End-of-Life (EOL) faster than expected. SSD Endurance Metrics should be tracked against expected message load profiles.
**Network Latency Jitter:** Monitor per-queue latency on the 100GbE NICs. High jitter suggests packet processing delays, potentially indicating an issue with Receive Side Scaling (RSS) configuration or CPU core affinity.
**Broker Application Metrics:** Critical metrics include subscription table size, pending message queues (especially for offline clients), and the time taken for the broker to complete a synchronous disk write acknowledgement.

5.3 Power Draw and Capacity Planning

The power draw (1.5 kW peak) necessitates careful planning within the data center rack.

**PDU Loading:** A standard 30A/208V rack circuit can typically support 4-5 of these HP servers comfortably (assuming 80% sustained load). Undersizing the PDU leads to tripped breakers during startup or peak operation.
**Thermal Management:** The high thermal output requires adequate hot/cold aisle separation and sufficient CRAC (Computer Room Air Conditioning) capacity. Overheating components (especially the CPUs and NVMe drives) will trigger thermal throttling, causing massive, sudden spikes in message latency, which is unacceptable for real-time systems. Data Center Thermal Standards must be strictly adhered to.

5.4 High Availability (HA) and Disaster Recovery

While this document focuses on a single node's capability, production deployments require redundancy. The hardware supports active/passive clustering via high-speed interconnects (often a dedicated, low-latency 25GbE link separate from the main data plane).

**State Synchronization:** If using stateful persistence (QoS 1/2), the synchronization mechanism (e.g., Raft consensus or leader-follower replication) must be robust enough to handle the I/O load generated by the primary node without overwhelming the network interface or the secondary node's storage. Distributed Consensus Algorithms are key here.
**Failover Testing:** Regular failover tests must be conducted to validate that the secondary node can absorb the full connection load and resume message delivery without data loss, verifying the integrity of the Persistent Message Queue state.

5.5 Software Stack Dependencies

The hardware is only one part of the equation. The choice of operating system and broker software profoundly impacts performance.

**OS Selection:** A lightweight, high-performance Linux distribution (e.g., RHEL/CentOS Stream, optimized Ubuntu Server) is preferred. Kernel bypass techniques, where supported by the broker (e.g., DPDK integration), can further reduce latency by avoiding the standard kernel network stack entirely. Kernel Bypass Networking is an advanced topic for achieving ultra-low latency.
**Broker Choice:** Brokers designed around asynchronous I/O frameworks (e.g., Netty, Tokio) are best suited for this hardware, allowing the high core count to manage thousands of concurrent non-blocking operations efficiently. Asynchronous Programming Models are essential for scaling TCP connections effectively.

This high-performance configuration provides the necessary foundation—CPU cycles, memory bandwidth, and I/O speed—to run the most demanding MQTT broker software stacks reliably at enterprise scale. Server Lifecycle Management practices must evolve to meet the demands of this specialized hardware.

---

Intel-Based Server Configurations

Configuration	Specifications	Benchmark
Core i7-6700K/7700 Server	64 GB DDR4, NVMe SSD 2 x 512 GB	CPU Benchmark: 8046
Core i7-8700 Server	64 GB DDR4, NVMe SSD 2x1 TB	CPU Benchmark: 13124
Core i9-9900K Server	128 GB DDR4, NVMe SSD 2 x 1 TB	CPU Benchmark: 49969
Core i9-13900 Server (64GB)	64 GB RAM, 2x2 TB NVMe SSD
Core i9-13900 Server (128GB)	128 GB RAM, 2x2 TB NVMe SSD
Core i5-13500 Server (64GB)	64 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Server (128GB)	128 GB RAM, 2x500 GB NVMe SSD
Core i5-13500 Workstation	64 GB DDR5 RAM, 2 NVMe SSD, NVIDIA RTX 4000

AMD-Based Server Configurations

Configuration	Specifications	Benchmark
Ryzen 5 3600 Server	64 GB RAM, 2x480 GB NVMe	CPU Benchmark: 17849
Ryzen 7 7700 Server	64 GB DDR5 RAM, 2x1 TB NVMe	CPU Benchmark: 35224
Ryzen 9 5950X Server	128 GB RAM, 2x4 TB NVMe	CPU Benchmark: 46045
Ryzen 9 7950X Server	128 GB DDR5 ECC, 2x2 TB NVMe	CPU Benchmark: 63561
EPYC 7502P Server (128GB/1TB)	128 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/2TB)	128 GB RAM, 2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (128GB/4TB)	128 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/1TB)	256 GB RAM, 1 TB NVMe	CPU Benchmark: 48021
EPYC 7502P Server (256GB/4TB)	256 GB RAM, 2x2 TB NVMe	CPU Benchmark: 48021
EPYC 9454P Server	256 GB RAM, 2x2 TB NVMe

Order Your Dedicated Server

Configure and order your ideal server configuration

Need Assistance?

Telegram: @powervps Servers at a discounted price

⚠️ *Note: All benchmark scores are approximate and may vary based on configuration. Server availability subject to stock.* ⚠️

MQTT

Contents