Is open-weight AI actually cheaper than the big proprietary APIs — and by how much? We took the API pricing for all 29 priced models in our models database, normalized each to a single blended cost per million tokens, and split them into open-weight versus proprietary. The gap is bigger — and far more consistent — than most people assume.
Punti chiave
- The 5 cheapest models in 2026 are all open-weight. The 5 most expensive are all proprietary.
- Il typical (median) open model costs ~$0.15 per 1M blended tokens; the typical proprietary model costs ~$6.00 — a 39× gap.
- On average, proprietary models cost ~16× more than open ones.
- Across all 29 models, the full price spread is ~890× — from ~$0.02 to $20 per 1M blended tokens.
- And that ignores self-hosting, which removes per-token cost entirely for open weights.
Come abbiamo effettuato le misurazioni
- Scope — all 29 models in the Convly database with public API pricing.
- Costo combinato —
(3 × input + output) / 4, a 3:1 input-to-output ratio typical of real API traffic, so models with cheap input but pricey output are directly comparable. - Classificazione — “open-weight” = downloadable weights you can self-host (22 models); “proprietary” = API-only (7 models).
- Sources — published API pricing via OpenRouter and DeepInfra, June 2026.
The gap, in one table
| Metric (blended $/1M) | Open-weight (22) | Proprietary (7) | Gap |
|---|---|---|---|
| Average | $0.50 | $8.16 | 16× |
| Median (typical model) | $0.15 | $6.00 | 39× |
| Cheapest in group | $0.02 (Llama 3.1 8B) | $2.00 (Claude Haiku 4.5) | — |
| Most expensive in group | $3.00 (Mistral Large 3) | $20.00 (Claude Fable 5) | — |
The extremes tell the story
Sort all 29 models by blended cost and the pattern is stark — open weights own the bottom, proprietary owns the top:
| 5 cheapest (all open-weight) | Costo combinato ($/1M) | 5 most expensive (all proprietary) | Costo combinato ($/1M) |
|---|---|---|---|
| Llama 3.1 8B | $0.02 | Claude Fable 5 | $20.00 |
| Mistral 7B | $0.02 | GPT-5.5 | $11.25 |
| Mistral NeMo 12B | $0.03 | Claude Opus 4.8 | $10.00 |
| Gemma 3 4B | $0.06 | Claude Sonnet 4.6 | $6.00 |
| Qwen3 8B | $0.07 | Gemini 3.1 Pro | $4.50 |
There is no proprietary model in the cheapest third of the market, and no open-weight model in the most expensive third. The lone overlap zone is narrow: the cheapest proprietary model (Claude Haiku 4.5, $2.00) sits just below the most expensive open one (Mistral Large 3, $3.00).
Important nuance: this is cost, not capability
The priciest models still lead on the hardest reasoning and agentic tasks. In our companion AI Price-Performance Index we found the frontier premium buys the last points of intelligence, not proportional value. But for the majority of production workloads — classification, extraction, RAG, summarization, chat — the capability gap between a good open model and a frontier model is far smaller than the 39× price gap. You are often paying 39× for the last 10–20% of capability you may not need.
Why the gap is structural
This isn’t a temporary discount war. Intense open-weight competition — Qwen, Llama, Gemma, DeepSeek and Mistral all shipping strong models under permissive licenses — has driven the price floor toward zero. Meanwhile frontier labs price for peak capability and enterprise willingness-to-pay. The result is a market that is bifurcating: a race-to-zero floor and a premium ceiling, with a widening canyon between them.
Considerazioni finali
For cost-sensitive production, an open or mid-tier model is the rational default in 2026 — and self-hosting removes per-token cost entirely (check what your GPU can run with our calcolatore di VRAM). Reserve the proprietary frontier for the genuinely hardest tasks. Run your own usage through the API cost calculator to see your exact numbers.
Data: Convly AI models database (API pricing via OpenRouter and DeepInfra). Blended cost uses a 3:1 input:output ratio. Figures current as of June 2026.
