Inside NVIDIA’s Blackwell generation, AI builders face one clean decision: the RTX 5090 or the RTX 5080. The 5090 costs roughly twice as much. It also has twice the VRAM. For AI, that second fact is the one that matters.
La réponse est courte : the 5080 is plenty for mainstream local AI; the 5090 exists for the people who need to run big.
Principaux enseignements
- The RTX 5090 has 32 GB GDDR7; the RTX 5080 has 16 GB — a 2x capacity gap.
- The 5090 is also ~1.7–1.9x faster thanks to far more CUDA cores and bandwidth.
- Only the 5090 runs Llama 3 70B (4-bit) in VRAM; the 5080 cannot.
- The 5090 draws 575 W and demands a 1000 W PSU; the 5080’s 360 W is far easier to build around.
- Buy the 5080 for 8B–13B models and image generation; buy the 5090 if you need 70B-class models or maximum speed.
En bref
| Spec | RTX 5090 | RTX 5080 |
|---|---|---|
| Architecture | Blackwell GB202 | Blackwell GB203 |
| Cœurs CUDA | 21,760 | 10,752 |
| VRAM | 32 GB GDDR7 | 16 GB GDDR7 |
| Largeur de bande de la mémoire | 1,792 GB/s | ~960 GB/s |
| Tenseur FP16 (dense) | ~419 TFLOPS | ~450 TFLOPS* |
| TDP | 575 W | 360 W |
| MSRP | $1,999 | $999 |
*Peak tensor TFLOPS figures vary by clock and sparsity mode; the 5090’s far larger core count makes it decisively faster in real workloads.
VRAM decides the whole comparison
For local AI, the question is never “how fast” before “does it fit.” Here the two cards split cleanly:
- RTX 5080 — 16 GB: runs Lama 3 8B at 8-bit, Classe 13B models at 4-bit, Diffusion stable XL et Flux.1, and LoRA fine-tuning of 7B–8B models. It cannot hold a 70B model.
- RTX 5090 — 32 GB: does everything the 5080 does, plus runs Llama 3 70B at 4-bit (~40 GB? — see below), much longer context windows, larger fine-tunes, and big image and video models with room to spare.
A clarification on 70B: a 70B model at Q4_K_M needs roughly 40 GB, which exceeds even 32 GB. But the 5090 runs 70B at more aggressive quantization (Q3/IQ-class) fully in VRAM, and runs heavier quantizations with only light offload. The 5080, at 16 GB, is not in that conversation at all. For anything approaching 70B, the 5090 is the only consumer option.
Speed: the 5090 is also simply faster
Capacity aside, the 5090 has roughly double the CUDA cores et nearly double the memory bandwidth. That makes it much faster even on models that fit comfortably on both:
| Charge de travail | RTX 5090 | RTX 5080 |
|---|---|---|
| Lama 3 8B Q4_K_M | ~180 tok/s | ~125 tok/s |
| Llama 3 13B-classe Q4 | ~120 tok/s | ~78 tok/s |
| SDXL 1024×1024 (30 étapes) | ~25 it/s | ~14 it/s |
| Llama 3 70B (quantized) | Runs in VRAM | Does not fit |
Across workloads the 5090 lands around 1.7–1.9x the 5080’s throughput — and on large models the comparison stops being about speed and becomes about possibility.
Power and build cost
The performance comes at a real-world price beyond the MSRP. The 5090 draws 575 W, demands a 1000 W PSU, generates serious heat, and needs a case with genuine airflow. The 5080’s 360 W is far gentler — an 850 W PSU and a normal mid-tower handle it easily. When you budget the 5090, budget the platform around it too.
Choose the RTX 5090 if
- You need to run 70B-class models locally
- You want maximum speed for image and video generation
- You do larger fine-tunes or need long context windows
Choisissez la RTX 5080 si
- Your models are 8B–13B — the large majority of local AI
- You want a cooler, quieter, cheaper-to-build machine
- You would rather spend the $1,000 saved elsewhere
Who should actually buy the 5090?
Be honest about your workloads. If you run 8B and 13B models and do Stable Diffusion, the 5080 handles all of it well — paying double for the 5090 buys speed you will enjoy but do not need. The 5090 earns its price for a specific user: someone who genuinely needs 70B-class models, long contexts, or the fastest possible iteration on heavy generative work. For that person, no other consumer card competes. For everyone else, the 5080 is the rational pick.
FAQ
Is the RTX 5090 worth double the price of the 5080 for AI?
Only if you need its 32 GB of VRAM — for 70B-class models, long contexts, or big fine-tunes. If your work is 8B–13B models and image generation, the 5080 does it well and saves you $1,000.
La RTX 5080 peut-elle faire tourner Llama 3 70B ?
No. With 16 GB of VRAM it cannot hold a 70B model even heavily quantized. Running 70B locally requires the 32 GB RTX 5090 or a multi-GPU setup.
How much faster is the 5090 than the 5080?
Roughly 1.7–1.9x in real AI workloads, driven by nearly double the CUDA cores and memory bandwidth. On models too large for the 5080, the 5090 is not just faster — it is the only one that runs them.
Does the RTX 5090 need a special power supply?
Yes. It draws 575 W and NVIDIA recommends a 1000 W PSU. The 5080’s 360 W is satisfied by a standard 850 W unit, making it much simpler and cheaper to build around.
Verdict
Les RTX 5090 is the most capable consumer AI GPU in existence — 32 GB of VRAM and class-leading speed make it the only card that brings 70B-class models within reach of a desktop. But it is a specialist’s tool. For the workloads most people actually run, the RTX 5080 delivers everything needed at half the price and a fraction of the power and build complexity. Buy the 5090 because you need its memory — not because it is the flagship.
