On June 13, 2026, Zhipu AI (which now brands its products as Z.ai) pushed GLM 5.2 to every tier of its GLM Coding Plan. The headline number is a 1,000,000-token context window, five times what GLM 5.1 offered, paired with MIT-licensed open weights that Zhipu promised would land within the week alongside the standalone API and chatbot. For a model aimed squarely at long-horizon agentic coding, the size of that context jump is the whole story.
What was missing from the launch announcement was just as notable: not a single benchmark score. No SWE-bench, no Terminal-Bench, no Code Arena number. That is unusual for a frontier-scale release, and for the first few days everything written about GLM 5.2’s “performance” was either vendor marketing or someone’s weekend vibe-check. That changed when the open weights went public on June 16: Zhipu published a full benchmark suite, and independent evaluators followed quickly. This piece covers what GLM 5.2 actually is, the specs Zhipu confirmed, the numbers that now exist (and how much to trust them), how to access or self-host a model of this size, how it stacks up against GLM 5.1 and other open coding models, and who should bother.
Key takeaways
- Released June 13, 2026 on the GLM Coding Plan; the API, chatbot, and MIT open weights followed on June 16.
- ~753B-parameter Mixture-of-Experts (per Zhipu’s own model card) with roughly 40B active parameters per token, exposed in Claude Code as the model ID
glm-5.2[1m](base IDglm-5.2). - 1,000,000-token context (up from GLM 5.1’s ~200K) with output capped at 131,072 tokens and two reasoning modes, High and Max.
- Anthropic-compatible endpoint means Claude Code, Cline, OpenCode, OpenClaw and others point at it by changing one base URL.
- Benchmarks now exist. They were absent at the June 13 soft launch but shipped with the weights: vendor-reported SWE-bench Pro 62.1 and Terminal-Bench 2.1 of 81.0, plus an independent Artificial Analysis Intelligence Index score of 51 that makes it the top open-weights model. Treat vendor numbers as vendor numbers; the independent ones corroborate the broad picture.
- Self-hosting is a data-center job: roughly 8x H200 at FP8, or fewer GPUs with aggressive INT4 quantization, before you account for the 1M-context KV cache.
What GLM 5.2 actually is
GLM 5.2 is the third release in Zhipu’s GLM-5 line, following GLM 5 and GLM 5.1, and it is built for one job: writing and maintaining software across long, multi-step sessions. It is a sparse Mixture-of-Experts (MoE) model with roughly 753 billion total parameters but only about 40 billion active on any given token. (Zhipu’s Hugging Face model card lists 753B; some third-party trackers round it to ~744B, the same as GLM 5.1.) That sparsity is what lets a model this large run at a usable speed and price, because you pay compute for the ~40B active parameters, not the full 753B, per forward pass.
Two things define the GLM 5.2 generation versus its predecessor. First, context: the model accepts up to 1,000,000 input tokens. The standalone API exposes a default model ID of glm-5.2 (with a shorter context), while the full 1-million-token window is addressed as glm-5.2[1m] — the variant you wire into Claude Code. A million tokens is enough to hold a mid-sized repository, its tests, and a long working transcript in a single window. Second, output: it can emit up to 131,072 tokens in one response, which matters when an agent is generating an entire module or a sprawling refactor diff rather than a snippet.
Zhipu replaced the older effort presets with two thinking-effort levels, High and Max, and recommends Max for complex, multi-step coding work. There is no Low or Auto setting. If you want background on Zhipu’s earlier models and how the company got here, our primer on Zhipu’s GLM lineup walks through the family tree.
The specs, and the benchmarks that arrived late
Here is the part worth reading slowly, because the situation moved fast. Zhipu shipped GLM 5.2 to the Coding Plan on June 13 with no published evaluations of any kind. Outlets covering that soft launch, including MarkTechPost, all noted the same thing: the announcement talked about availability, context length, and the open-source roadmap, and said nothing about how the model scored.
That changed on June 16, when the open weights went public on Hugging Face and Zhipu published a benchmark table alongside them. So the “benchmark vacuum” was real, but it was a launch-timing quirk, not a permanent one. Two things follow.
First, the vendor-reported numbers. On Zhipu’s own card, GLM 5.2 posts SWE-bench Pro of 62.1 (versus 58.4 for GLM 5.1 and 58.6 for GPT-5.5, but behind Claude Opus 4.8 at 69.2) and Terminal-Bench 2.1 of 81.0 (versus GLM 5.1’s ~63.5, and just behind Opus 4.8 at 85.0 and GPT-5.5 at 84.0). On the FrontierSWE long-horizon suite, Zhipu reports GLM 5.2 trailing Opus 4.8 by roughly one point. These are vendor-run figures and should be read as such — favorable harness choices are normal in first-party tables.
Second, and more useful, independent evaluators have now weighed in and broadly corroborate the picture. Artificial Analysis scores GLM 5.2 at 51 on its Intelligence Index v4.1, making it the leading open-weights model, ahead of MiniMax-M3 (44), DeepSeek V4 Pro (44) and Kimi K2.6 (43). On the community-voted Code Arena, GLM 5.2 (Max) ranks #2 in the Frontend/WebDev leaderboard, behind only Claude Fable 5 and well ahead of other open models. One genuine caveat the independent data surfaces: GLM 5.2 burns far more output tokens per task than its peers (Artificial Analysis measured ~43k per Intelligence Index task, up from ~26k for GLM 5.1), which eats into its cost advantage on long jobs.
So the honest framing today is not “no numbers, trust nothing.” It is: GLM 5.2 is a verified strong open-weights model on independent intelligence and frontend-coding leaderboards, while its first-party agentic-coding scores (SWE-bench Pro, Terminal-Bench) should be sanity-checked against a neutral evaluator like LiveBench or your own repository before you treat any “beats GPT-5.5” headline as settled. Several of those headlines are technically supported on specific benchmarks — GLM 5.2 does edge GPT-5.5 on SWE-bench Pro in Zhipu’s table — but it loses to Claude Opus 4.8 across most of the same suite, so framing matters.
| Attribute | GLM 5.2 (confirmed) |
|---|---|
| Coding Plan launch | June 13, 2026 |
| API & open weights | June 16, 2026 |
| Total parameters | ~753B (MoE; some trackers list ~744B) |
| Active per token | ~40B |
| Context window | 1,000,000 tokens (glm-5.2[1m]) |
| Max output | 131,072 tokens |
| Reasoning modes | High, Max |
| License | MIT (open weights) |
| Independent benchmark | Artificial Analysis Intelligence Index 51 (top open-weights model) |
How to access GLM 5.2 in the cloud
The fastest path is the GLM Coding Plan, a subscription that routes coding agents through Zhipu’s hosted endpoints. Promotional launch tiers run roughly $10/month for Lite (about 400 prompts/week), ~$30/month for Pro (~2,000 prompts/week), and ~$80/month for Max (~8,000 prompts/week), with seat-based pricing for Team. List (non-promo) prices are higher — some resellers quote closer to $18 / $72 / $160 — and quotas shift, so confirm the current numbers on Z.ai before subscribing.
If you would rather pay per token, the standalone API lists at roughly $1.40 per million input tokens and $4.40 per million output on Zhipu’s own endpoint, with prompt caching that drops cached input to about $0.26 per million and can cut the effective cost substantially on repeated context. Third-party gateways such as OpenRouter advertise comparable rates (Simon Willison tested it there at the same $1.40 / $4.40), so shop the resellers if cost is the deciding factor.
The hook that makes GLM 5.2 interesting for existing workflows is the Anthropic-compatible endpoint. Tools that already speak the Anthropic Messages API can be redirected to Zhipu by setting an environment variable, no code changes required:
| Setting | Value |
|---|---|
ANTHROPIC_BASE_URL | https://api.z.ai/api/anthropic |
| Model (Claude Code, 1M) | glm-5.2[1m] |
| Coding endpoint (Cline, etc.) | https://api.z.ai/api/coding/paas/v4 |
| Long-call timeout | Raise API_TIMEOUT_MS (e.g. 3,000,000) for Plan-mode runs |
That single swap is why GLM 5.2 shipped with day-one support for Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw and Kilo Code. If you live in a terminal-native agent, our walkthrough of OpenCode and how it handles model backends covers the wiring in more detail.
The hardware reality of running ~753B yourself
The MIT license is the marquee feature, and it is genuine: now that the weights are public on Hugging Face, you can download, fine-tune, and self-host GLM 5.2 with no usage or regional restrictions. The catch is that “open” does not mean “runs on your laptop.” A ~753B model is a data-center workload.
At FP8 precision (roughly one byte per parameter), the weights alone need on the order of 750GB of VRAM, which in practice means about 8x H200 (141GB each) or 8x B200. Drop to INT4 and the footprint falls to roughly 370GB, which fits on about 4x H200 — or you can spread it across more, lower-memory cards such as 8x H100, at the cost of some quality. And those figures are before the context: a 1-million-token KV cache adds an estimated 80GB or more on top, so the 1M-context configuration realistically wants the H200/B200 class of node. Reported deploy guides put a single 8x H200 box in the rough neighborhood of $10k/month on spot pricing, rising toward $25k or more on on-demand GPU clouds.
For the overwhelming majority of teams, that math says use the API. Self-hosting GLM 5.2 makes sense only when data residency, air-gapping, or very high sustained volume justify the operational burden — and note that the convenient hosted API runs on Chinese infrastructure, which is its own consideration for some buyers. If your real goal is a model you can run on hardware you actually own, a ~753B MoE is the wrong tool, and our guide to the best local LLMs for coding points at options sized for a single workstation or a modest GPU server.
Strengths
- 1M-token context is genuinely large and well-suited to whole-repo agentic work.
- Permissive MIT license with full open weights, not a research-only or non-commercial tag.
- Independently the top open-weights model on the Artificial Analysis Intelligence Index, and #2 on Code Arena’s frontend leaderboard.
- Drop-in Anthropic-compatible endpoint means near-zero migration cost from Claude clients, and Coding Plan pricing undercuts closed frontier APIs for heavy users.
Caveats
- First-party agentic-coding scores (SWE-bench Pro, Terminal-Bench) are vendor-run and trail Claude Opus 4.8; confirm with neutral evaluators or your own tasks.
- Uses notably more output tokens per task than peers, denting its cost advantage on long jobs.
- Self-hosting requires multi-GPU data-center hardware, not consumer or prosumer kit; the hosted API runs on Chinese infrastructure.
- Only High and Max effort levels; no cheap, fast mode for trivial tasks. Pricing and quotas are still settling.
GLM 5.2 vs GLM 5.1 and the open-weight field
Against its own predecessor, GLM 5.2 is roughly the same size — Zhipu describes it as the same parameter class as GLM 5.1 (~753B vs ~754B) — with the same MoE design and ~40B active parameters. The leap is almost entirely the context window and output ceiling, plus a measurable bump in benchmark scores.
| Model | Total params | Context | Max output | License | SWE-bench Pro (vendor) |
|---|---|---|---|---|---|
| GLM 5.2 | ~753B MoE | 1,000,000 | 131,072 | MIT | 62.1 |
| GLM 5.1 | ~754B MoE | ~200,000 | ~131K | MIT | 58.4 |
In the broader open-weights coding race, GLM 5.2 now enters as the front-runner on several independent boards rather than an unproven newcomer. Moonshot’s Kimi K2 generation and the latest DeepSeek and Qwen coders all publish SWE-bench and agentic-coding results, and Qwen’s flagship also offers a 1M-token context — but on the Artificial Analysis Intelligence Index, GLM 5.2 (51) sits ahead of DeepSeek V4 Pro (44) and Kimi K2.6 (43). That said, leaderboard position is not the same as fit for your codebase, and on first-party agentic suites GLM 5.2 still trails the closed frontier (Claude Opus 4.8). For a sense of how the other Chinese labs trade blows, see our breakdown of DeepSeek V4 versus Qwen 3, and for the model most often cross-shopped against it, our look at Kimi K2.7 for coding. We also put the two head-to-head in GLM 5.2 vs Kimi K2.7 for coding.
FAQ
Is GLM 5.2 actually open source?
The weights are released under the MIT license, which is one of the most permissive licenses available and allows commercial use, modification, and redistribution. The weights went public on Hugging Face (as zai-org/GLM-5.2 and an FP8 build) on June 16, 2026. Note that “open weights under MIT” is not the same as a fully open-source project with public training data; you get the model, not the recipe.
How much does GLM 5.2 cost to use?
Through the API, expect roughly $1.40 per million input tokens and $4.40 per million output on Zhipu’s endpoint, with caching dropping cached input to about $0.26 per million. The subscription GLM Coding Plan is often cheaper for steady use, with promotional tiers starting around $10/month for Lite and scaling to ~$80/month for Max (list prices run higher). Third-party providers such as OpenRouter list comparable per-token rates.
Can I run GLM 5.2 on my own GPU?
Only if “my own GPU” means a multi-GPU server. The ~753B weights need roughly 8x H200 at FP8, or about 4x H200 (or more lower-memory cards) with INT4 quantization, and the 1M-token context adds a large KV-cache requirement on top. A single consumer GPU cannot run this model; for that you want a smaller, purpose-built local model.
Does GLM 5.2 work with Claude Code?
Yes. Zhipu exposes an Anthropic-compatible endpoint, so you point Claude Code at https://api.z.ai/api/anthropic, set the model to glm-5.2[1m], and supply a Z.ai API key. Raising the request timeout is recommended for long planning runs. The same approach works for Cline, OpenCode, OpenClaw, Goose, Roo Code, Crush, and Kilo Code.
How does GLM 5.2’s context window compare to GLM 5.1?
It is five times larger: 1,000,000 tokens versus roughly 200,000 in GLM 5.1. The maximum output also stays high at 131,072 tokens, which together make GLM 5.2 better suited to holding an entire codebase plus a long agent transcript in one session.
Did Zhipu publish benchmarks for GLM 5.2?
Not at the June 13 Coding Plan launch — that release focused on availability and the open-weights roadmap. But Zhipu published a full benchmark table when the weights went public on June 16, and independent labs followed: Artificial Analysis rates it the top open-weights model on its Intelligence Index (51), and Code Arena ranks it #2 on frontend coding. Vendor-run agentic scores (SWE-bench Pro 62.1, Terminal-Bench 2.1 of 81.0) should still be sanity-checked against neutral evaluations.
Is GLM 5.2 better than Kimi K2 or DeepSeek for coding?
On independent aggregate intelligence it currently leads them: Artificial Analysis scores GLM 5.2 at 51 versus DeepSeek V4 Pro and Kimi K2.6 in the low-40s, and it tops both on Code Arena’s frontend board. On any specific agentic-coding task the gap can close or reverse, and all three publish detailed SWE-bench results, so for a high-stakes decision run a head-to-head on your own repository rather than trusting a single leaderboard.
Bottom line
GLM 5.2 is a real and notable release: a ~753B-parameter, MIT-licensed coding model with a 1-million-token context and a drop-in Anthropic-compatible API that lets you swap it into Claude Code or Cline in seconds. For heavy agentic-coding users who want long context and permissive licensing, the value proposition is strong, and the Coding Plan pricing is aggressive.
The benchmark gap that defined the first 72 hours has closed: independent evaluators now rank GLM 5.2 as the leading open-weights model on aggregate intelligence and near the top on frontend coding, which is a genuine credential. Keep two caveats in view, though. The flashiest “beats GPT-5.5” claims rest on vendor-run agentic benchmarks where GLM 5.2 still trails Claude Opus 4.8, and the model spends a lot of output tokens, so verify the economics on your own workload. The hardware reality points the same way: for almost everyone, this is a cloud API to test, not weights to self-host. A serious trial is clearly warranted; whether it earns a full migration depends on how it does on your code, not on the leaderboard.
