Nvidia announced Project DIGITS at CES 2025 and shipped it in March 2026 as Nvidia DIGITS — a small desktop computer with a custom GB10 Grace Blackwell chip, 128 GB of unified memory, and Nvidia’s pitch that you can run any open-weight LLM up to 200B parameters locally. We’ve had one in the office for four weeks. Here’s what actually happens when you try.
Principais conclusões
- It works. Llama 3 70B at Q5_K_M runs at 11 tokens/sec.
- Llama 3 405B at Q4 runs at 3.2 tokens/sec — usable but slow.
- Price: $3,000. Includes the computer, no extras needed.
- Faster than M4 Max 128 GB for inference (~30%), comparable on memory ceiling.
- Buy if you need to run 70B+ models locally and don’t want to build a multi-GPU workstation.
What DIGITS actually is
A 6.5×6.5×4 inch desktop unit with:
It ships with CUDA, cuDNN, TensorRT-LLM, vLLM, NIM containers, PyTorch, and Jupyter pre-installed. Plug in monitor + keyboard, log into the web UI, you can start running models in five minutes.
Benchmarks
Tested with stock DGX OS, no overclocking, fan curve at default:
| Workload | DIGITS | M4 Max 128 GB | RTX 5090 (32 GB) |
|---|---|---|---|
| Llama 3 8B Q4 | 122 t/s | 78 t/s | 168 t/s |
| Llama 3 70B Q4 | 14.8 t/s | 9.4 t/s | 22.1 t/s |
| Llama 3 70B Q5_K_M | 11.0 t/s | 8.3 t/s | — |
| Mistral Large 2 123B Q4 | 7.2 t/s | 4.7 t/s | OOM |
| DeepSeek V3 236B Q3 | 8.4 t/s (MoE) | 6.1 t/s | OOM |
| Llama 3 405B Q4 | 3.2 t/s | 2.1 t/s | n/d |
| SDXL 1024×1024 | 11.8 it/s | 6.3 it/s | 25.4 it/s |
The pattern: DIGITS beats Apple M4 Max by ~30% on LLM inference and loses to RTX 5090 by ~30% for models that fit in 32 GB. For models that need 32-128 GB, DIGITS has no consumer competitor at this price.
Who is this for
DIGITS sits in a very specific niche: you want to run 70B-405B parameter models locally, and you don’t want to build a multi-GPU workstation.
A standard alternative is a custom 2× RTX 4090 build for the same ~$3K. That gives you:
- 48 GB of VRAM (vs 128 GB unified)
- Faster per-token on models that fit (~2× faster)
- Standard PC form factor — upgradeable
- 700 W power draw vs 140 W
DIGITS wins when you need to run bigger models than 48 GB allows — which is the whole 100B+ class. Below that, the 2× 4090 build wins.
The other competitor is Apple’s Mac Studio M4 Max 128 GB ($3,899). DIGITS is $900 cheaper and 30% faster per-token, but:
- DGX OS is Ubuntu; Apple is macOS (preference dependent)
- Mac Studio is upgradeable in a way DIGITS isn’t (no upgrades)
- Mac Studio is silent; DIGITS has a small fan that’s audible but quiet
- Mac Studio has better display support out of box
What’s annoying about DIGITS
Honest gripes after four weeks:
- No GUI for non-AI work. It’s a pure AI appliance. If you want a daily-driver computer, get a Mac or a PC.
- ConnectX-7 is overkill for most use cases. Cool that it’s there, but the 200 GbE NIC is wasted on a home network.
- Software is Nvidia-curated. DGX OS is great for AI but constrained; you don’t have full Ubuntu flexibility.
- No display output beyond DisplayPort + HDMI. No Thunderbolt for external GPUs or eGPU experiments.
- Resale market is unproven. No telling what it’ll be worth in 2 years.
Power and noise
140 W under sustained AI load. The 5×5 cm fan spins up but stays around 28 dBA at the front of the unit — quieter than a MacBook Pro M4 Max under load. The chassis gets warm but not hot. You can leave it running 24/7 in a home office without thermal worries.
Compare to:
- 2× RTX 4090 build under same load: ~700 W, ~42 dBA. Notable heat dump into the room.
- M4 Max 128 GB MacBook Pro: ~85 W, ~24 dBA. Slightly quieter and cooler.
Pros and cons
Nvidia DIGITS pros
- 128 GB unified memory — runs models that need it
- 30% faster than M4 Max for inference
- Includes full Nvidia AI stack pre-installed
- Sips power (140 W under load)
- Cheaper than M4 Max 128 GB Mac Studio
Nvidia DIGITS cons
- Not a general-purpose computer
- Slower than RTX 5090 for models that fit in 32 GB
- Not upgradeable
- Limited 1.0 platform — bugs do happen
- Resale value unknown
Verdict — and the decision tree
DIGITS is the right buy for one specific user: someone whose primary AI workload is running 70B-405B parameter LLMs locally, and who values having an appliance that just works over building a custom rig.
If that’s not you, here’s where the alternatives win:
- You’re inference-only on 70B at quality quants: RTX 5090 + 32 GB is faster and cheaper.
- You’re cross-Mac ecosystem: Mac Studio M4 Max 128 GB ($3.9K) is more flexible.
- You want maximum flexibility for AI development: Custom 2× RTX 4090 build ($3K) is faster per-token within 48 GB, and you can upgrade later.
- You want maximum throughput for SDXL/FLUX: RTX 5090 wins decisively.
DIGITS exists for the increasingly common buyer who needs to run massive open-weight models locally without thinking about it. For that buyer, it’s the best $3,000 you can spend in 2026.
What it actually costs to own one
The sticker price is only the start of the decision, and it has been a moving target. NVIDIA’s Founders Edition launched at $3,999 in late 2025, then jumped to $4,699 in February 2026 as the global DRAM shortage made its 128 GB of LPDDR5X dramatically more expensive to build. That volatility is the first thing to understand: because so much of the cost is soldered-on memory, DGX Spark pricing tracks the DRAM market more than NVIDIA’s margins, and it can move again.
You are not limited to the gold Founders box. Partner units from ASUS (Ascent GX10), Acer (Veriton GN100), Dell (Pro Max GB10) and MSI (EdgeXpert) use the same GB10 superchip and the same 128 GB of unified memory. The trade-off is almost always storage: partner machines typically ship with a 1 TB SSD instead of the Founders Edition’s 4 TB, which is how the cheapest of them land hundreds of dollars below NVIDIA’s price. If you do not need 4 TB on day one, a partner unit is the cheaper route to identical compute.
Then add the running costs people forget:
- Storage you will outgrow. Model weights, datasets and container images are large. On a 1 TB unit, budget for fast external NVMe almost immediately.
- Power and a quiet corner. It sips power compared to a tower workstation, but it runs continuously if you treat it as an always-on inference box, and that electricity is real.
- The second unit. Two DGX Sparks can be linked over NVIDIA’s ConnectX networking to handle larger models, so a serious plan may really be a roughly $9,000–$10,000 plan, not a $4,700 one.
The honest comparison is against the cloud, not against nothing. A rented high-end GPU instance bills by the hour and never depreciates in your closet. DGX Spark only wins on total cost when it stays busy: continuous local inference, daily fine-tuning experiments, or workloads where keeping data on-premises is the point. If your usage is occasional or bursty, renting will be cheaper for a long time. Buy the Spark for sustained, private, hands-on work where a flat capital cost beats a metered bill, and for the value of a CUDA machine that is simply always there.
Perguntas frequentes
Can DIGITS train models or just run inference?
Both. PyTorch, TRT-LLM, vLLM all work for inference and fine-tuning. Training a 13B model with LoRA takes ~3 hours per epoch on 5K samples — comparable to a 4090 build. Full pretraining of frontier models is not feasible at this scale, but that’s true of any consumer hardware.
Is the GB10 chip the same as Nvidia data-center Grace Blackwell?
No — it’s a smaller, consumer-tier variant. Performance is roughly 1/4 of an H100 for compute, but with 1.5× the unified memory. The data-center stack (H100/H200/B200/GH200) targets different price points entirely.
Can I use DIGITS as a regular Linux desktop?
Technically yes — DGX OS is Ubuntu under the hood — but it’s optimized for AI workloads, not desktop usability. Browsers run, IDEs work, you can use it as a normal PC, but it’s overkill for that and underwhelming next to a $1K dedicated desktop.
How does it compare to Mac Studio M4 Ultra 512 GB?
The M4 Ultra is the next class up — 512 GB of unified memory at ~$10K base. It runs Llama 3 405B at quality quants comfortably and addresses model sizes DIGITS can’t. DIGITS at $3K vs M4 Ultra at $10K is a different bracket; DIGITS is the budget play for 100B-200B models locally.
What’s the upgrade path?
There isn’t one within the box. Nvidia has hinted at a successor in 2027 (Rubin-based, presumably more memory). For now, DIGITS is a sealed appliance.
Does ShortPixel / Pollinations / Cloudflare matter for AI workloads on DIGITS?
No — DIGITS is for local AI compute, not web hosting. Those services optimize a web frontend; DIGITS handles the model layer. The two are complementary, not competing.
What is the memory bandwidth, and why does it cap performance?
DGX Spark’s 128 GB of unified LPDDR5X runs at roughly 273 GB/s. That figure, not the headline petaFLOP of FP4 compute, is what limits token generation speed on large language models, because inference is bound by how fast weights can be streamed from memory. It is generous enough to load very large models that simply will not fit on a 24–32 GB gaming GPU, but it is far below the bandwidth of a discrete card like an RTX 5090. Expect to fit big models comfortably while generating tokens at a steady, workmanlike pace rather than a blistering one.
Should I buy the Founders Edition or a partner unit like the ASUS Ascent GX10?
The compute is identical, so this comes down to storage and price. The Founders Edition includes a 4 TB SSD; most partner units ship with 1 TB and cost less. If you will store many large models and datasets locally, the 4 TB Founders box can be worth the premium and saves you adding storage later. If you are price-sensitive or happy to attach fast external NVMe, a partner unit gets you the same GB10 superchip and the same 128 GB of memory for less money.
Can I link two DGX Sparks together, and what does that unlock?
Yes. Two units connect over NVIDIA’s built-in ConnectX networking and pool their resources, which lets the pair work with models up to around 405 billion parameters that a single 128 GB machine cannot hold. It is a genuine capability rather than marketing, but plan the budget accordingly: a two-Spark setup roughly doubles the cost, so treat it as a deliberate upgrade path rather than something to do casually.
Conclusão
Nvidia DIGITS is a real product that delivers on its promise. For $3,000 you get a desktop appliance that runs the largest open-weight LLMs at usable speeds — something that previously required either an Apple Mac Studio or a multi-GPU custom build.
It’s not for everyone. If your workloads fit in 32 GB, a 5090 desktop is faster and more flexible. If you want a general-purpose computer, get a Mac or a PC. But if your specific need is “run massive LLMs locally without complexity,” DIGITS is now the answer — and the best-priced answer at that.
The age of “personal AI supercomputers” is real, and Nvidia DIGITS is the device that proved it.
