Is open-weight AI actually cheaper than the big proprietary APIs — and by how much? We took the API pricing for all 29 priced models in our models database, normalized each to a single blended cost per million tokens, and split them into open-weight versus proprietary. The gap is bigger — and far more consistent — than most people assume.
Wichtigste Erkenntnisse
- The 5 cheapest models in 2026 are all open-weight. The 5 most expensive are all proprietary.
- Der typical (median) open model costs ~$0.15 per 1M blended tokens; the typical proprietary model costs ~$6.00 — a 39× gap.
- On average, proprietary models cost ~16× more than open ones.
- Across all 29 models, the full price spread is ~890× — from ~$0.02 to $20 per 1M blended tokens.
- And that ignores self-hosting, which removes per-token cost entirely for open weights.
So haben wir gemessen
- Scope — all 29 models in the Convly database with public API pricing.
- Durchschnittskosten —
(3 × input + output) / 4, a 3:1 input-to-output ratio typical of real API traffic, so models with cheap input but pricey output are directly comparable. - Klassifikation — “open-weight” = downloadable weights you can self-host (22 models); “proprietary” = API-only (7 models).
- Sources — published API pricing via OpenRouter and DeepInfra, June 2026.
The gap, in one table
| Metric (blended $/1M) | Open-weight (22) | Proprietary (7) | Gap |
|---|---|---|---|
| Average | $0.50 | $8.16 | 16× |
| Median (typical model) | $0.15 | $6.00 | 39× |
| Cheapest in group | $0.02 (Llama 3.1 8B) | $2.00 (Claude Haiku 4.5) | — |
| Most expensive in group | $3.00 (Mistral Large 3) | $20.00 (Claude Fable 5) | — |
The extremes tell the story
Sort all 29 models by blended cost and the pattern is stark — open weights own the bottom, proprietary owns the top:
| 5 cheapest (all open-weight) | Durchschnittliche Kosten pro 1 Mio. | 5 most expensive (all proprietary) | Durchschnittliche Kosten pro 1 Mio. |
|---|---|---|---|
| Llama 3.1 8B | $0.02 | Claude Fable 5 | $20.00 |
| Mistral 7B | $0.02 | GPT-5.5 | $11.25 |
| Mistral NeMo 12B | $0.03 | Claude Opus 4.8 | $10.00 |
| Gemma 3 4B | $0.06 | Claude Sonnet 4.6 | $6.00 |
| Qwen3 8B | $0.07 | Gemini 3.1 Pro | $4.50 |
There is no proprietary model in the cheapest third of the market, and no open-weight model in the most expensive third. The lone overlap zone is narrow: the cheapest proprietary model (Claude Haiku 4.5, $2.00) sits just below the most expensive open one (Mistral Large 3, $3.00).
Important nuance: this is cost, not capability
The priciest models still lead on the hardest reasoning and agentic tasks. In our companion AI Price-Performance Index we found the frontier premium buys the last points of intelligence, not proportional value. But for the majority of production workloads — classification, extraction, RAG, summarization, chat — the capability gap between a good open model and a frontier model is far smaller than the 39× price gap. You are often paying 39× for the last 10–20% of capability you may not need.
Why the gap is structural
This isn’t a temporary discount war. Intense open-weight competition — Qwen, Llama, Gemma, DeepSeek and Mistral all shipping strong models under permissive licenses — has driven the price floor toward zero. Meanwhile frontier labs price for peak capability and enterprise willingness-to-pay. The result is a market that is bifurcating: a race-to-zero floor and a premium ceiling, with a widening canyon between them.
Fazit
For cost-sensitive production, an open or mid-tier model is the rational default in 2026 — and self-hosting removes per-token cost entirely (check what your GPU can run with our VRAM-Rechner). Reserve the proprietary frontier for the genuinely hardest tasks. Run your own usage through the API cost calculator to see your exact numbers.
Data: Convly AI models database (API pricing via OpenRouter and DeepInfra). Blended cost uses a 3:1 input:output ratio. Figures current as of June 2026.
