No company has done more to reset the economics of AI than DeepSeek. In early 2025 it shocked the industry by matching frontier models at a fraction of the cost; by mid-2026, with DeepSeek V4, it sits at the top of the open-weight leaderboards while charging a tenth of what US labs charge. This is the full picture: who DeepSeek is, what V4 actually delivers, and the honest case for and against building on it.
Punti chiave
- DeepSeek V4 (April 2026) is a 1.6T-parameter open-weight MoE model under the permissive MIT license.
- Best price-performance in AI: ~$0.44/$0.87 per million tokens — 5-10x cheaper than GPT-5.5 or Claude Opus 4.8.
- Frontier-class coding: 80.6% SWE-bench Verified, 93.5 LiveCodeBench — competitive with the best Western models.
- 1M-token context and genuinely strong long-context retrieval.
- Caveats: data routes through China on the hosted API; content moderation reflects Chinese regulations. Self-hosting the open weights avoids both.
- Who is DeepSeek
- What DeepSeek V4 actually is
- The pricing that reset the market
- Benchmarks — is it actually good?
- Where DeepSeek wins
- Where DeepSeek loses — the honest caveats
- DeepSeek vs the field
- Pros and cons
- How to access DeepSeek
- What it takes to self-host DeepSeek V4
- Domande frequenti
- Conclusione
- Articoli correlati
Who is DeepSeek
DeepSeek is a Hangzhou-based AI lab spun out of High-Flyer, a quantitative hedge fund founded by Liang Wenfeng. That heritage matters: High-Flyer had already stockpiled thousands of GPUs for quant trading, and DeepSeek inherited both the hardware and an engineering culture obsessed with efficiency. While US labs raced to spend more, DeepSeek’s defining trait became doing more with less — a philosophy that produced models trained for a reported fraction of competitors’ budgets.
DeepSeek’s 2025 releases (V3 and the R1 reasoning model) were the moment the West realized Chinese open-weight models were not years behind — they were months behind, and catching up. V4 is the continuation of that arc.
What DeepSeek V4 actually is
DeepSeek V4 launched on April 24, 2026 as two open-weight models:
- V4-Pro — a 1.6-trillion-parameter Mixture-of-Experts model with 49B parameters active per token. This is the flagship.
- V4-Flash — a leaner 284B MoE with 13B active, built for high-throughput, low-cost serving.
Both support a 1-million-token context window and ship under the MIT license, meaning you can download the weights from Hugging Face, fine-tune them, and deploy them commercially with no restrictions or royalties. That openness is the entire strategic point: DeepSeek gives the model away and competes on inference price and brand.
The pricing that reset the market
This is the headline. As of a permanent price change in May 2026, DeepSeek’s API costs roughly $0.44 per million input tokens and $0.87 per million output tokens for the Pro model. V4-Flash is an order of magnitude cheaper again (~$0.10/$0.20).
To put that in perspective:
| Modello | Input / 1M | Output / 1M | Pesi aperti |
|---|---|---|---|
| DeepSeek V4-Pro | ~$0.44 | ~$0.87 | Yes (MIT) |
| GPT-5.5 | ~$1.25 | ~$10 | No |
| Claude Opus 4.8 | ~$5 | ~$25 | No |
| Gemini 3.5 Flash | ~$0.30 | ~$2.50 | No |
For a team running millions of tokens a day through an agent, the difference between DeepSeek and Claude Opus is the difference between a hobby budget and a serious infrastructure line item.
Benchmarks — is it actually good?
Cheap doesn’t matter if the output is weak. DeepSeek V4 is not weak. Independent and vendor benchmarks put V4-Pro-Max at:
- 80.6% on SWE-bench Verified — real-world software engineering tasks, competitive with frontier Western models.
- 93.5 on LiveCodeBench — strong coding under contamination-resistant testing.
- 83.5 on MRCR 1M needle-in-a-haystack retrieval — actually surpassing Gemini 3.1 Pro on academic long-context benchmarks.
The pattern across 2026: DeepSeek is no longer “good for the price.” On coding and long-context, it’s simply good, and the price is a bonus.
Where DeepSeek wins
1. Cost per unit of intelligence
Nothing else comes close at this quality tier. If your workload is token-heavy — coding agents, document processing, RAG over large corpora — DeepSeek changes what’s economically possible.
2. Open weights under MIT
You can self-host. For regulated industries, air-gapped environments, or anyone uncomfortable sending data to a third party, the ability to run V4 on your own hardware (or a neutral cloud) is decisive. MIT is the most permissive license a frontier-class model has shipped under.
3. Long context that works
The 1M window isn’t a spec-sheet number. The MRCR retrieval scores show V4 actually uses its context, which matters for codebase-wide reasoning and long-document analysis.
Where DeepSeek loses — the honest caveats
1. Hosted API data residency
If you use DeepSeek’s own API (rather than a Western host or self-hosting), your prompts route through servers in China and are subject to Chinese law. For many startups this is irrelevant; for enterprises with data-sovereignty requirements, it’s a blocker. The workaround is real: because the weights are MIT-licensed, you can run V4 via a Western provider (Together, Fireworks, OpenRouter) or on your own infrastructure, keeping data out of China entirely.
2. Content moderation reflects Chinese rules
Like all China-hosted models, DeepSeek’s official deployment restricts politically sensitive topics per Chinese regulations. Self-hosted open weights behave differently, but the hosted API will refuse or deflect on certain subjects. Know this before building a product on the hosted endpoint.
3. Less polished tooling and ecosystem
DeepSeek’s developer ecosystem — SDKs, docs, integrations — is improving fast but still trails OpenAI and Anthropic. You’re trading polish for price.
DeepSeek vs the field
| Dimensione | DeepSeek V4 | Qwen3.7 Max | Kimi K2.6 | GPT-5.5 |
|---|---|---|---|---|
| Price/performance | Best in class | Buono | Very good | Expensive |
| Pesi aperti | Yes (MIT) | No (Max) | Sì | No |
| Coding (SWE-bench) | 80.6% | Eccellente | 80.2% | Eccellente |
| Finestra contestuale | 1 milione | 1 milione | 262K | 400K |
| Ecosystem maturity | Improving | Buono | Improving | Migliore |
Pros and cons
DeepSeek pros
- Unbeatable price-performance at frontier quality
- Fully open weights under MIT — self-hostable
- Frontier-class coding and long-context
- 1M-token context that genuinely works
- Drove the entire industry’s prices down
DeepSeek cons
- Hosted API routes data through China
- Content moderation reflects Chinese regulations
- Developer tooling less polished than US labs
- Brand/trust concerns for some enterprises
How to access DeepSeek
- Hosted API: platform.deepseek.com — cheapest, but data goes through China.
- Western providers: Together AI, Fireworks, OpenRouter host the open weights — Western data residency at slightly higher cost.
- Self-host: download V4 weights from Hugging Face (MIT) and run on your own GPUs or a neutral cloud. Best for privacy/compliance.
- Consumer app: the DeepSeek chat app and website for casual use.
What it takes to self-host DeepSeek V4
“Open weights” sounds like “runs on my gaming PC.” It does not. DeepSeek V4 ships in two open variants released in April 2026, and the hardware gap between them is enormous. The honest starting point: both are Mixture-of-Experts models, and that has a counterintuitive consequence. Only a fraction of the parameters fire on each token, but every expert weight must still be resident in VRAM. The active-parameter count tells you about speed, not about how much memory you need to buy.
| Variant | Parametri totali / attivi | Realistic local footprint |
|---|---|---|
| V4-Flash | 284B / 13B | ~170 GB at native precision; ~90–100 GB at community INT4 |
| V4-Pro | 1.6T / 49B | ~860 GB native — cluster-class only |
V4-Pro is not a home model. At roughly 860 GB for the weights alone, it needs a multi-GPU node (think 8x H200-class accelerators) or several nodes networked together. Unless you are a lab or a funded startup, treat Pro as an API-only model.
V4-Flash is the self-hostable one, and even it is demanding. In its native FP4+FP8 checkpoint you are looking at around 170 GB of VRAM — a pair of data-center cards such as H200s, or roughly four A100 80GB. Aggressive community INT4 quantization drops it near 90–100 GB, which a four-card RTX 4090 box can hold for low-concurrency, internal use. People do run it on two consumer cards, but expect tiny batches, short context, and prototype-grade throughput rather than a production API.
For serving, vLLM is the standard: it understands V4’s expert parallelism and KV-cache behavior, and its tensor parallelism prefers power-of-two GPU counts (2, 4, 8). The 1M-token context is real but costs additional memory for the KV cache, so size for your actual context length, not the maximum.
The deciding question is rarely “can I run it?” — it is “should I?” DeepSeek’s hosted API is so cheap that self-hosting only wins on cost at sustained, high-volume usage. Below that, you are buying one thing the API cannot sell you: data control. Run it yourself when privacy or air-gapping is the requirement; otherwise the API is almost always the cheaper, faster path.
Domande frequenti
Is DeepSeek safe to use?
For non-sensitive work, yes. For sensitive or regulated data, avoid the China-hosted API and instead self-host the open weights or use a Western provider. The model itself is not malware; the concern is purely data routing and jurisdiction.
Is DeepSeek really free?
The weights are free (MIT license) — you only pay for compute if you self-host. The hosted API is paid but extremely cheap (~$0.44/$0.87 per million tokens). There’s also a free consumer chat app.
Is DeepSeek better than ChatGPT?
On price-performance, decisively. On raw frontier capability and ecosystem polish, GPT-5.5 still leads. For coding and cost-sensitive workloads, DeepSeek is often the smarter choice; for the most demanding reasoning and the richest tooling, GPT-5.5 wins.
Can I run DeepSeek offline?
Yes — that’s the point of open weights. V4-Flash (284B) runs on a high-end multi-GPU workstation; V4-Pro (1.6T) needs serious hardware but can be self-hosted by organizations with the infrastructure.
Does DeepSeek censor responses?
The China-hosted API restricts politically sensitive topics per Chinese regulations. Self-hosted open weights behave more openly. This is the single most important thing to understand before building on the hosted endpoint.
Who owns DeepSeek?
DeepSeek is a Hangzhou-based AI lab spun out of High-Flyer, a Chinese quantitative hedge fund founded by Liang Wenfeng. That hedge-fund heritage gave it both a large pre-existing GPU cluster and an engineering culture obsessed with efficiency — a big part of why its models are so cheap to run.
Is DeepSeek banned in the US or Europe?
There is no blanket consumer ban as of 2026, but several governments and large organizations have restricted DeepSeek’s hosted app on official devices over data-residency concerns — data on the China-hosted API is subject to Chinese law. The open weights are unaffected: you can run them on your own hardware or a Western provider with none of those concerns, which is the recommended path for sensitive use.
What hardware do I need to run DeepSeek V4 locally?
It depends entirely on the variant. V4-Pro (1.6T parameters) needs a multi-GPU server cluster with roughly 860 GB of VRAM and is not realistic for individuals. V4-Flash (284B parameters) is the self-hostable option: around 170 GB of VRAM at native precision, or near 90–100 GB with aggressive INT4 quantization — meaning a multi-card data-center setup, or a four-card RTX 4090 box for light, internal workloads. Because it is a Mixture-of-Experts model, all expert weights must sit in memory even though only a small slice runs per token.
Is it cheaper to self-host DeepSeek or use the API?
For almost everyone, the API is cheaper. DeepSeek’s hosted pricing is low enough that buying and powering GPUs only pays off at sustained, very high token volumes — typically hundreds of millions of tokens per month. The real reason to self-host is not cost but control: keeping data on your own machines, meeting compliance rules, or running fully offline. If those are not requirements, default to the API.
What is the difference between DeepSeek V4-Pro and V4-Flash?
They are two sizes of the same generation. V4-Pro is the 1.6-trillion-parameter flagship (49B active per token) aimed at maximum capability and run on clusters. V4-Flash is a leaner 284-billion-parameter model (13B active) that is far cheaper to serve, faster, and the only one a small team can realistically self-host. Both share the Mixture-of-Experts design and the 1M-token context window, so for many tasks Flash delivers most of the value at a fraction of the hardware cost.
Conclusione
DeepSeek is the most important force in AI pricing today. DeepSeek V4 delivers frontier-class coding and long-context performance, ships as fully open weights under MIT, and costs a fraction of any Western frontier model. For builders who care about cost — which is most builders — it has become impossible to ignore.
The caveats are real but manageable: route around the China-hosted API for sensitive data by self-hosting or using a Western provider, and understand the content-moderation behavior before you ship. Do that, and DeepSeek V4 is arguably the best value in artificial intelligence in 2026 — and the clearest sign that the era of US labs charging premium prices for frontier capability is ending.
