RTX 5080 Super & 5070 Super for AI: What the Leaked VRAM Upgrades Mean for Local LLMs (2026)

Updated August 1, 2026 · Originally published June 11, 2026

For gamers, the rumored RTX 50 Super refresh is about a few extra frames. For anyone running AI locally, it’s about the one number that actually limits you: VRAM. Leaks point to a big jump — 24GB on the RTX 5080 Super and 18GB on the RTX 5070 Super — and if accurate, that reshapes what models you can run on a consumer card. Here’s the honest, AI-focused breakdown — with a clear flag on what’s confirmed and what isn’t.

Key takeaways

Not official yet. NVIDIA hasn’t confirmed the RTX 50 Super refresh — these are leaks, rumored for later in 2026.
The leaked VRAM jumps: RTX 5080 Super → 24GB (from 16GB); RTX 5070 Super → 18GB (from 12GB).
Why it matters for AI: VRAM, not raw speed, decides how large a local LLM you can run. More VRAM = bigger models.
What 24GB unlocks: comfortable 4-bit inference of up to ~30B-class models — a real step up from today’s 16GB cards.
Should you wait? Maybe — but a 2026 memory crunch and uncertain timing mean “available and affordable” is not guaranteed.

Is the RTX 50 Super refresh even real?

Be clear-eyed here: NVIDIA has not officially announced an RTX 50 Super series. Everything below comes from hardware leakers, and the timeline has slipped repeatedly. As of mid-2026, reporting suggests the refresh is “back on track” for later in the year, with leaked specs pointing to meaningful VRAM upgrades — but nothing is confirmed, and launch timing (and especially pricing) could change.

So treat this as a rumor worth understanding, not a product to count on. With that caveat firmly in place, the leaked specs are genuinely interesting for AI users.

The leaked specs

Card (rumored)	VRAM	Notable leaked specs
RTX 5080 Super	24GB GDDR7	~10,752 CUDA cores, 32Gbps, ~450W, +9–16% vs 5080
RTX 5070 Ti Super	~24GB GDDR7	Up from 16GB (specs less certain)
RTX 5070 Super	18GB GDDR7	6,400 CUDA cores, 192-bit, 28Gbps, 275W
RTX 5060 (Super?)	12GB	Entry tier, rumored to compete with AMD’s RX 9070 GRE

The pattern is consistent: NVIDIA is reportedly pushing more memory at each tier, which is exactly what the AI crowd has been asking for. The raw compute bumps (single-digit to mid-teens percentages) are modest; the VRAM bumps are the story.

Why VRAM is the number that matters for local AI

For gaming, GPU performance is mostly about cores and clocks. For running large language models locally, the binding constraint is almost always VRAM — because the entire model (plus its context) has to fit in memory to run fast. Run out of VRAM and the model either won’t load or spills into system RAM, where it crawls.

That’s why a card’s memory capacity often matters more than its speed for AI. A faster GPU with too little VRAM simply can’t run a model that a slower, higher-memory card handles with ease. (For the full picture, see our guide to VRAM requirements for every major LLM.)

What you could actually run

Here’s the practical payoff of the leaked memory tiers, using common 4-bit quantization:

24GB (RTX 5080 Super): comfortably runs up to ~30B-parameter models at 4-bit, with room for solid context — a genuine step up from the 16GB ceiling that forces today’s RTX 5080 owners to stop around 14B–20B. It also makes image and video generation far less cramped.
18GB (RTX 5070 Super): handles ~14B-class models comfortably and runs smaller models fast — a meaningful upgrade over 12GB cards that struggle past 8B.
12GB (RTX 5060): fine for 7B–8B models and lighter workloads.

To be clear about the ceiling: even 24GB won’t run a 70B model unquantized — those still need a high-memory workstation card, multiple GPUs, or a dedicated local-AI box. NVIDIA is steering serious >70B local work toward its 96GB Blackwell Pro cards and the DGX/RTX Spark line, not the consumer Super refresh. But for the models most people actually run, 18–24GB is the sweet spot. Pair one with the best local LLMs to run on Ollama and you have a capable home AI rig.

What about AMD and Intel?

The Super refresh wouldn’t exist in a vacuum. AMD has already shipped the Radeon RX 9070 GRE in 2026, and its next-generation RDNA 5 (UDNA) architecture isn’t expected until late 2027 or 2028 — so NVIDIA’s mid-cycle refresh would land against AMD’s current lineup, not a new one. Intel’s Arc continues to fight for the budget tier. For AI specifically, AMD remains a viable local-inference option, though NVIDIA’s CUDA ecosystem still dominates most local-LLM tooling (weigh our ROCm vs CUDA breakdown before going red-team).

The bigger force shaping all of this is the 2026 memory crunch: surging demand for the high-bandwidth memory that AI accelerators consume has tightened supply and lifted prices across the GPU market. That’s the same pressure reportedly complicating the Super refresh’s timing — and a reason not to assume these cards will arrive cheap or in volume.

A note on power and your PSU

One practical wrinkle from the leaks: the RTX 5080 Super’s rumored 450W board power (up from 360W on the 5080) is a meaningful jump. If you plan around one, budget for a strong power supply — roughly an 850W unit or better for a single-GPU AI workstation — plus adequate cooling. For always-on local inference, that higher draw also means higher running costs than a 16GB card. It’s a reminder that “more VRAM” isn’t free: you pay for it in watts as well as dollars.

Should you wait for it?

Honestly, it depends on your timeline and tolerance for uncertainty:

If you can wait and you run local AI: the VRAM upgrade is worth watching closely — 24GB at (hopefully) mainstream pricing would be the best value local-AI card NVIDIA has offered in a while.
If you need a GPU now: don’t hold your breath. The refresh isn’t confirmed, timing keeps slipping, and 2026’s memory shortage and AI-accelerator demand are squeezing consumer GPU supply and prices. A bird in the hand — a current 16GB+ card for local LLMs — may beat waiting indefinitely for a rumor.
If you need >70B models: the Super refresh isn’t your answer regardless; look at high-VRAM workstation cards or a dedicated local-AI device.

RTX 50 Super vs current options (for AI)

Option	VRAM	Best for
RTX 5080 Super (rumored)	24GB	Up to ~30B local models, if it ships
RTX 5090 (available)	32GB	The current consumer VRAM king
RTX 5080 (available)	16GB	Up to ~14–20B today
RTX 5070 Super (rumored)	18GB	~14B local models, better value

Note that the already-available RTX 5090 has 32GB — so if you need maximum consumer VRAM today and can afford it, it already exists. The Super refresh’s appeal is bringing more VRAM to the mid tiers at (hopefully) lower prices.

FAQ

Is the RTX 5080 Super confirmed?

No. As of mid-2026, NVIDIA has not officially announced an RTX 50 Super series. The 24GB RTX 5080 Super and 18GB RTX 5070 Super come from hardware leaks, with a refresh rumored for later in 2026. Treat the specs and timing as unconfirmed.

How much VRAM does the RTX 5080 Super have?

According to leaks, 24GB of GDDR7 — up from 16GB on the standard RTX 5080. If accurate, that’s the single most important upgrade for AI users, since VRAM determines how large a local model you can run.

Is the RTX 5080 Super good for AI and local LLMs?

If the 24GB leak holds, yes — it would comfortably run up to roughly 30B-parameter models at 4-bit quantization, a clear step up from 16GB cards. It still won’t run unquantized 70B models, which need high-VRAM workstation hardware.

Why does VRAM matter more than speed for local AI?

Because the entire model and its context must fit in GPU memory to run fast. If a model doesn’t fit in VRAM, it either won’t load or spills into system RAM and slows to a crawl. So memory capacity usually sets the hard limit on what you can run; speed only affects how fast it runs once it fits.

Should I wait for the RTX 50 Super or buy now?

If you run local AI and can wait, it’s worth watching — 24GB at a mainstream price would be excellent value. But it’s unconfirmed, the timeline keeps slipping, and a 2026 memory crunch is squeezing GPU supply and pricing. If you need a card now, a current 16GB+ GPU (or the 32GB RTX 5090) is the safer bet.

Bottom line

The rumored RTX 50 Super refresh is the rare GPU leak that matters more to AI users than to gamers — because the headline change is VRAM, the one spec that decides how large a local LLM you can run. If the 24GB RTX 5080 Super and 18GB RTX 5070 Super ship as leaked, they’d be the most genuinely useful local-AI consumer cards NVIDIA has offered in years.

The catch is everything around the specs: it’s unconfirmed, the timing has slipped repeatedly, and 2026’s memory shortage makes pricing and availability a real question. Watch it closely if you run AI at home — but don’t put your build on hold for a card NVIDIA hasn’t even acknowledged yet.

Written by Mustafa Ihsan

Mustafa Ihsan is the founder and editor of Convly.ai. He built and maintains the site's live AI models database, its price-performance index, and its free calculators for VRAM requirements, API costs and self-hosting economics. He writes about model pricing, benchmark results and the hardware needed to run AI models locally, and consistently prefers measured numbers to vendor claims.

All articles by Mustafa Ihsan · About Convly