Admin: New server config article

2026-04-12T15:39:28Z

New server config article

New page

'''NVIDIA H100 NVL Server''' is a high-end GPU cloud server available from [https://en.immers.cloud/signup/r/20241007-8310688-334/ Immers Cloud]. The H100 NVL variant features 94 GB of HBM3 memory, positioned between the standard H100 (80 GB) and H200 (141 GB) in terms of memory capacity.

== Specifications ==
{| class="wikitable"
|-
! Component !! Specification
|-
| '''GPU''' || NVIDIA H100 NVL (Hopper architecture)
|-
| '''VRAM''' || 94 GB HBM3
|-
| '''Memory Bandwidth''' || ~3.9 TB/s
|-
| '''FP16 Performance''' || ~989 TFLOPS
|-
| '''FP8 Performance''' || ~1,979 TFLOPS
|-
| '''Interconnect''' || NVLink (high-bandwidth bridge)
|-
| '''Starting Price''' || From $4.11/hr
|}

== Performance ==
The H100 NVL sits in a unique position in the Hopper lineup:
* '''94 GB HBM3''' — 17.5% more VRAM than the standard H100's 80 GB HBM2e
* '''HBM3 memory''' — faster memory type than HBM2e, providing higher bandwidth
* '''NVL bridge''' — optimized for paired NVL configurations with high-bandwidth GPU-to-GPU communication

The NVL variant was designed specifically for large language model inference, where the extra 14 GB of VRAM per GPU can make the difference between fitting a model on fewer GPUs versus needing an additional GPU.

For training workloads, performance is comparable to the standard [[NVIDIA H100 Server]], with the memory advantage allowing larger batch sizes and micro-batch configurations.

== Best Use Cases ==
* LLM inference serving (fitting larger models per GPU)
* Fine-tuning large foundation models
* Paired NVL inference clusters for production AI
* AI model serving with high concurrency
* Research requiring slightly more VRAM than 80 GB
* Multi-modal inference (vision + language models)

== Pros and Cons ==
=== Advantages ===
* 94 GB HBM3 — 17.5% more VRAM than standard H100
* HBM3 provides higher memory bandwidth than HBM2e
* Optimized NVL bridge for paired configurations
* Full Hopper architecture with FP8 tensor cores
* Better price-per-GB of VRAM than H200

=== Limitations ===
* Only 14 GB more VRAM than standard H100
* Higher cost than standard H100 ($4.11 vs $3.83/hr)
* Less VRAM than H200 (94 vs 141 GB)
* NVL benefits only fully realized in paired configurations

== Pricing ==
Available from [https://en.immers.cloud/signup/r/20241007-8310688-334/ Immers Cloud] starting at '''$4.11/hr'''. Approximately 7% more expensive than the standard H100 for 17.5% more VRAM.

== Recommendation ==
Choose the '''H100 NVL''' when 80 GB VRAM is just barely not enough for your model or batch size. The extra 14 GB can eliminate the need for model parallelism in some cases, which simplifies deployment and improves throughput. If you need significantly more VRAM, step up to the [[NVIDIA H200 Server]]. If 80 GB is sufficient, save with the [[NVIDIA H100 Server]].

== See Also ==
* [[NVIDIA H100 Server]]
* [[NVIDIA H200 Server]]
* [[NVIDIA A100 Server]]

[[Category:GPU Servers]]
[[Category:AI Training]]
[[Category:Data Center GPU]]

NVIDIA H100 NVL Server - Revision history

Admin: New server config article