Huawei spent the past nine months turning its AI silicon plans into a calendar. At Huawei Connect last September it published a four-chip Ascend roadmap; at the Huawei Cloud INSPIRE Creators Conference this June it put a near-term date on the most important part of it. The Ascend 950DT, the training-and-decode member of the 950 family, lands on Huawei Cloud in August 2026 with a full commercial launch in Q4. Company VP Chen Lin summed up the cadence as “one generation per year, doubling the computing power.”
That is the pitch. This piece is about how much of it is real. We will walk through the chip roadmap and its actual specs, the openPangu models trained on Ascend, the year-end open-source push around CANN and the Mind toolchain, and the constraints nobody at the keynote dwelled on: a 7nm ceiling at SMIC, a homegrown HBM supply that cannot keep up, and a per-chip gap to NVIDIA that the roadmap quietly admits.
Key takeaways
- One chip a year, each ~2x the last. Ascend 950PR (Q1 2026), 950DT (cloud Aug 2026, commercial Q4 2026), 960 (Q4 2027), 970 (Q4 2028), building toward a system-level 4 FP4 zettaflops by 2028.
- The 950 is a parity-with-Hopper part, not a Blackwell killer. Per chip it lands around 1 PFLOPS FP8 / 2 PFLOPS FP4 with 128–144 GB of Huawei’s own HBM — strong, but a fraction of a single NVIDIA Rubin GPU.
- Huawei’s real weapon is scale. The Atlas 950 SuperPoD wires 8,192 chips together and claims to beat NVIDIA’s NVL144 on aggregate compute, memory, and bandwidth by brute force.
- openPangu 2.0 went open at HDC 2026. A 505B-parameter Pro model (18B active) and a 92B Flash model (6B active), both 512K context, with seven components opening from June 30.
- The honest constraint is manufacturing. SMIC is stuck at 7nm and homegrown HBM is the bottleneck; in the most Huawei-favorable analyst scenario it still reaches only about 5% of NVIDIA’s aggregate AI compute in 2026, and the median estimate is far lower.
- Even Huawei’s own roadmap shows a 2026 regression. The 950PR/950DT have lower total processing performance than 2025’s Ascend 910C; by Huawei’s own plan the first chip to beat an H200 is the 960 in Q4 2027.
The roadmap: one generation a year
Huawei’s framing is a metronome. Four parts, one per year, each roughly doubling the last:
- Ascend 950PR — Q1 2026, prefill and recommendation
- Ascend 950DT — cloud in August 2026, commercial Q4 2026, decode and training
- Ascend 960 — Q4 2027
- Ascend 970 — Q4 2028
The “PR” and “DT” suffixes are the interesting part. Rather than ship one general-purpose accelerator, Huawei split inference in half. The 950PR is tuned for the prefill stage — the compute-heavy pass over your prompt — and for recommendation systems. The 950DT handles decode (token-by-token generation) and sustained training, which is why it gets the fatter memory. If you have read our NPU vs GPU explainer, this is a familiar idea pushed further: specialize the silicon to the phase of the workload.
The headline number — roughly 4 FP4 zettaflops by 2028 — is a system-level target for the Atlas 960 SuperCluster, not a single chip. Keep that distinction in mind every time you see a zettaflops figure attached to Huawei; the eye-watering numbers always describe a building full of accelerators, not the accelerator.
What the Ascend 950 actually is
Here are the per-chip specs Huawei has disclosed. These are vendor figures for parts that, as of mid-June 2026, are only partly shipping, so treat them as targets rather than benchmarked results.
| Spec | Ascend 950PR | Ascend 950DT |
|---|---|---|
| Availability | Q1 2026 | Cloud Aug 2026, commercial Q4 2026 |
| Role | Prefill / recommendation | Decode / training |
| FP8 compute | ~1 PFLOPS | ~1 PFLOPS |
| FP4 compute | ~2 PFLOPS | ~2 PFLOPS |
| Memory | 128 GB HiBL 1.0 | 144 GB HiZQ 2.0 |
| Memory bandwidth | ~1.6 TB/s | ~4.0 TB/s |
| Interconnect | 2 TB/s | 2 TB/s |
The genuinely notable thing here is the memory. HiBL and HiZQ are Huawei’s own high-bandwidth memory — homegrown HBM, developed because export controls cut off easy access to the latest stacks from SK Hynix, Micron, and Samsung. A Chinese vendor shipping competitive on-package HBM at all is a real engineering result, and the 950DT’s 144 GB at 4.0 TB/s is in the right ballpark for a modern training part. Huawei also says the 950DT’s 2 TB/s interconnect is about 2.5x that of its 910C predecessor — again, a vendor figure.
Now the reality check. NVIDIA’s Rubin VR200, also due in the second half of 2026, targets roughly 35 PFLOPS of FP4 for training and about 50 PFLOPS of FP4 for inference, with 288 GB of HBM4 at around 22 TB/s. (Those are NVIDIA’s own labels — training versus inference — not a dense-versus-sparse split.) On raw per-chip FP4, that is a gap of roughly 17x to 25x against a single Ascend 950’s ~2 PFLOPS, depending on which Rubin figure you use. Huawei’s own Atlas 350 card, built on the 950PR, claims 1.56 PFLOPS of FP4 and “2.8x the H20” — and even that is a comparison to the cut-down, export-compliant H20, not to a full Blackwell or Rubin, and it remains a vendor claim awaiting independent testing. The fair one-line summary, echoed by analysts who track the silicon, is that a single Ascend 950 reaches rough parity with NVIDIA’s Hopper generation, not with what NVIDIA is selling in 2026. For context on the NVIDIA side, see our Vera Rubin breakdown.
Scale as the strategy
Huawei knows it cannot win the chip-versus-chip fight, so it is not trying to. The bet is system architecture. The Atlas 950 SuperPoD ties together 8,192 Ascend 950DT accelerators into one logical machine: roughly 8 EFLOPS FP8 and 16 EFLOPS FP4, 1,152 TB of memory, and about 16 PB/s of interconnect bandwidth across an optical fabric. Stack 64 of those into an Atlas 950 SuperCluster and you get more than 520,000 NPUs delivering about 524 EFLOPS FP8 and roughly 1 FP4 zettaflops. The 2027 Atlas 960 SuperCluster pushes to the million-card level and the 2/4 zettaflops (FP8/FP4) figures.
Against NVIDIA’s NVL144, Huawei claims the 950 SuperPoD packs roughly an order of magnitude more accelerators and about 6.7x the aggregate compute, with far more memory (around 15x) and interconnect bandwidth. That can be simultaneously true and misleading: you are comparing an 8,192-chip pod to a 144-GPU rack. The honest reading is that if you have unlimited floor space, cheap power, and enough chips, you can out-muscle a smaller, more efficient NVIDIA system. Those are three big ifs, and the third one — enough chips — is exactly where the story gets hard.
openPangu: the model side
A chip platform is only as useful as the software people run on it, and Huawei has been busy on that front too. At its developer conference (HDC) in June 2026, Huawei released openPangu 2.0: a Pro model with 505B total parameters and 18B active, and a Flash model at 92B total / 6B active, both supporting 512K-token context. Huawei says the Pro model roughly doubles single-card throughput versus other leading open-source models on Ascend hardware — again, a vendor figure on its own silicon, not an independently benchmarked result.
This builds on 2025’s Pangu Pro MoE 72B, which introduced a Mixture of Grouped Experts (MoGE) design specifically shaped to balance load across Ascend chips. The pattern is deliberate: co-design the model architecture with the hardware so the accelerator’s weaknesses matter less. It is a different philosophy from the dense-then-sparse approach behind models like DeepSeek, but it shares the same goal — squeezing frontier-ish behavior out of constrained compute.
What’s working
- Homegrown HBM in volume — a real supply-chain milestone
- A credible, dated roadmap rather than vaporware
- Open-sourcing CANN, Mind, and Pangu to pull developers off CUDA
- System-scale designs that sidestep the per-chip gap
What’s holding it back
- SMIC capped at 7nm; large dies yield poorly
- HBM supply is the true ceiling on chips shipped
- Per-chip performance trails NVIDIA by roughly 5x on TPP
- The 2026 parts regress versus 2025’s own 910C on TPP
The open-source play
The software push is the part most likely to matter long-term. At Huawei Connect the company committed to opening its full stack by December 31, 2025: the CANN heterogeneous-compute toolkit (its answer to CUDA), the Mind series toolchains and development environment, and the openPangu foundation models. Eric Xu framed it as a long-term project, with Huawei pledging to spend roughly 15 billion yuan (about US$2.1 billion) a year over five years on ecosystem and open computing.
The logic is sound. NVIDIA’s real moat is not silicon, it is CUDA and the decade of libraries built on top of it. If Huawei wants Ascend to be more than a captive platform for Chinese hyperscalers, it has to make porting painless and give developers source access. Whether that lands is an empirical question you can answer over the coming months by watching the GitHub signals — active PRs, steady releases, community-maintained kernels. CANN’s compiler interfaces and virtual instruction set are slated to open (with the rest of CANN fully open-sourced); the proof will be third-party adoption outside Huawei’s own customers.
The constraints Huawei didn’t dwell on
Here is the uncomfortable core. Every impressive number above runs into the same wall: Huawei cannot make enough of these chips at a competitive process node.
SMIC is stuck at a 7nm-class process because export controls keep EUV lithography out of China, and yields on large AI dies at that node are poor. Worse, HBM is the bottleneck — more limiting than die production itself. By SemiAnalysis’s estimate, Chinese memory maker CXMT can produce only around 2 million HBM stacks next year, enough for roughly 250,000–300,000 Ascend-class chips, even though SMIC could bake die for more than a million. Without stacks, finished accelerators cannot ship, no matter how many compute dies SMIC produces.
The performance math follows from that. Analysts at the Council on Foreign Relations estimate the best U.S. AI chips are currently about five times more powerful than Huawei’s best on a total-processing-performance basis, widening to roughly seventeen times by the second half of 2027. On aggregate output, the CFR’s most Huawei-favorable scenario still has Huawei producing only about 5% of NVIDIA’s total AI compute in 2026, falling toward 2% in 2027 — and its median estimate is far lower, around 1%. Most telling: the 2026 Ascend 950PR and 950DT actually have lower TPP than 2025’s Ascend 910C — a sign of how hard domestic production is — and on Huawei’s own roadmap the first part to beat an H200 on performance or memory bandwidth is the Ascend 960 in Q4 2027. If you are choosing hardware to run models locally today, our best GPUs for local LLMs guide is a more practical starting point than anything in this roadmap.
None of this means the effort is theater. NVIDIA’s Jensen Huang has repeatedly called Huawei “formidable” — in May 2026 he said NVIDIA has “largely conceded” China’s advanced AI chip market to it. The competition is real; what the manufacturing math shows is that the timeline is the thing to watch, and timelines on constrained nodes slip.
FAQ
Is the Huawei Ascend 950 better than NVIDIA’s Blackwell or Rubin?
No, not per chip. A single Ascend 950 lands around Hopper-class performance — roughly 1 PFLOPS FP8 and 2 PFLOPS FP4 — while NVIDIA’s Rubin VR200 targets about 35 PFLOPS of FP4 for training and 50 PFLOPS for inference. Huawei’s argument is at the system level: wire thousands of chips together and beat a smaller NVIDIA rack on aggregate.
When does the Ascend 950DT actually ship?
It reaches Huawei Cloud in August 2026 as a cloud-accessible service, with a full commercial launch (cards and SuperPoD servers) slated for Q4 2026. The 950PR began shipping earlier, in Q1 2026.
What is openPangu and how is it different from Pangu Pro MoE 72B?
openPangu 2.0, released at HDC 2026, is the latest open-source family: a 505B-parameter Pro model (18B active) and a 92B Flash model (6B active), both with 512K context. The 2025 Pangu Pro MoE 72B was the earlier model that introduced the Mixture of Grouped Experts architecture tuned for Ascend.
Can Huawei make enough Ascend chips to matter?
That is the real limit. By SemiAnalysis’s estimate, HBM supply caps output at roughly 250,000–300,000 Ascend-class chips a year, and SMIC’s 7nm yields are weak. Even the most Huawei-favorable CFR scenario has it fielding only about 5% of NVIDIA’s aggregate AI compute in 2026, with the median estimate closer to 1%.
What are HiBL and HiZQ memory?
They are Huawei’s homegrown high-bandwidth memory, developed because export controls restrict access to the latest third-party HBM. The 950PR uses 128 GB of HiBL 1.0 (~1.6 TB/s); the 950DT uses 144 GB of HiZQ 2.0 (~4.0 TB/s).
Why is Huawei open-sourcing CANN and the Pangu models?
To break NVIDIA’s software lock-in. CUDA is NVIDIA’s real moat, so Huawei is opening CANN (its CUDA equivalent), the Mind toolchain, and the Pangu models to lower the cost of porting and build a developer ecosystem around Ascend.
What does “4 zettaflops by 2028” actually refer to?
It is a system-level target for the Atlas 960 SuperCluster — a million-card cluster — at FP4 precision, not a single chip. Individual Ascend accelerators are measured in petaflops, three orders of magnitude lower.
Bottom line
Huawei’s 2026 announcements are serious and constrained in equal measure. The roadmap is real, the homegrown HBM is a genuine milestone, the openPangu models and the CANN open-sourcing are smart moves to chip away at NVIDIA’s software moat, and the SuperPoD scale-out is a clever way to route around weak silicon. Take all of that at face value.
Then read the fine print. Per chip, the Ascend 950 is a Hopper-era part arriving in a Rubin-era year, and even Huawei’s own roadmap shows the 2026 chips regressing on total performance versus 2025’s 910C. The binding constraint is not ambition or design talent — it is a 7nm ceiling and an HBM supply that can feed only a few hundred thousand chips a year. For Chinese buyers cut off from NVIDIA, Ascend is the best option on the board and getting better; NVIDIA’s own CEO calls Huawei “formidable” and admits the company has largely conceded that market. For everyone watching the global race, the honest verdict is that Huawei has arrived as a real competitor, but the chips, the yields, and the calendar all still favor NVIDIA — and will into 2027 unless the manufacturing story changes.
