If you run models locally, the Ollama library is where most of them come from — but it changes constantly and the names are cryptic. This is a practical Ollama models list for 2026: the models people actually run, what each needs in memory, and what each is good at, plus how to list the models you already have and pull new ones. Ollama downloads a 4-bit quantised version by default, which is why a “70B” model can fit on a good workstation and an “8B” runs on a laptop. Sizes below are approximate defaults — always check the AI models database or run ollama list for what is current on your machine.
Quick reference
- Run on any laptop (8 GB RAM): Llama 3.2 3B, Phi-3 Mini, Gemma 3 4B — small, fast, offline.
- Best all-round (16 GB): Llama 3.1 8B, Qwen 2.5 7B, Mistral 7B — the sweet spot for most people.
- High quality (32 GB+ / GPU): Gemma 2 27B, Qwen 2.5 32B, Mixtral 8x7B.
- Near-frontier (workstation / 48 GB+): Llama 3.3 70B, DeepSeek-R1 70B.
- Reasoning: DeepSeek-R1 distills. Coding: Qwen 2.5 Coder, Code Llama. Vision: LLaVA. Embeddings: nomic-embed-text.
- The rule: pick by the memory you have — check any model with our free VRAM calculator.
- The most popular Ollama models at a glance
- How to list the Ollama models you have installed
- How to find and pull new models from the library
- Small models — run on almost any laptop
- Mid-size models — the 16 GB sweet spot
- Large models — workstation and GPU territory
- Specialised models: coding, vision and embeddings
- Which Ollama model should you actually use?
- Check a model fits before you download
- Frequently asked questions
- The bottom line
The most popular Ollama models at a glance
Every model below is available with a simple ollama pull <name>. “Download” is the approximate default 4-bit (Q4) size; “Min memory” is a practical floor of system RAM (CPU) or VRAM (GPU) to run it comfortably. Parameter counts are exact; sizes are approximate and move with each release.
| Model | Params | Download (Q4) | Min memory | Best for |
|---|---|---|---|---|
| Llama 3.2 | 1B / 3B | ~1.3 / 2 GB | 4–8 GB | Edge, phones, ultralight chat |
| Llama 3.1 | 8B | ~4.7 GB | 8–16 GB | Best all-round small model |
| Llama 3.3 | 70B | ~43 GB | 48 GB+ | Near-frontier open model |
| Gemma 3 | 1B / 4B | ~0.8 / 3.3 GB | 4–8 GB | Efficient small (Google) |
| Gemma 2 | 9B / 27B | ~5.4 / 16 GB | 12–32 GB | Strong quality per size |
| Qwen 2.5 | 0.5B–72B | ~0.4–47 GB | 4 GB+ | Multilingual, wide size range |
| Qwen 2.5 Coder | 1.5B–32B | ~1–20 GB | 8 GB+ | Local coding assistant |
| Mistral | 7B | ~4.1 GB | 8 GB | Fast, reliable classic |
| Mistral Nemo | 12B | ~7 GB | 16 GB | Long 128k context |
| Mixtral | 8x7B | ~26 GB | 32 GB+ | Mixture-of-experts quality |
| Phi-4 | 14B | ~9 GB | 16 GB | Reasoning in a small model |
| Phi-3 Mini | 3.8B | ~2.3 GB | 8 GB | Tiny but capable |
| DeepSeek-R1 (distill) | 1.5B–70B | ~1.1–43 GB | 8 GB+ | Step-by-step reasoning |
| LLaVA | 7B–34B | ~4.7–20 GB | 8 GB+ | Vision (image understanding) |
| nomic-embed-text | — | ~0.3 GB | 2 GB | Embeddings for RAG/search |
Want the cloud models these compare against on price and speed? The AI models database lists open and closed models side by side, and the AI API cost calculator shows when running locally beats paying per token.
How to list the Ollama models you have installed
To see every model already on your machine, with its size and when you last used it, run:
ollama list
That prints each model’s name, tag, unique ID and size. To see what is loaded in memory right now, use ollama ps; to remove one you no longer need and reclaim disk space, use ollama rm <name>. These three commands — list, ps and rm — are all you need to manage a local model collection.
How to find and pull new models from the library
Ollama’s full catalogue lives in its online library, and pulling any model is one command:
ollama pull llama3.1 or run it directly with ollama run llama3.1
Model names use tags for size and variant — llama3.1:8b, gemma2:27b, qwen2.5:14b. If you leave the tag off, Ollama pulls a sensible default (usually the most popular size at 4-bit). For a first install, our step-by-step Ollama install guide covers Mac, Windows and Linux.
Small models — run on almost any laptop
Models from 1B to about 4B parameters run happily on a modern laptop with 8 GB of RAM, no GPU required. Llama 3.2 3B, Gemma 3 4B and Phi-3 Mini are the standouts: quick, genuinely useful for summarising, drafting and simple questions, and small enough to keep loaded. They will not match a frontier cloud model, but for private, offline everyday tasks they are excellent — and they are the right starting point if you are new to local AI.
Mid-size models — the 16 GB sweet spot
The 7B–14B class is where most people should live. Llama 3.1 8B, Qwen 2.5 7B and Mistral 7B deliver a big jump in coherence over the small models while still fitting comfortably in 16 GB of RAM or a mainstream GPU. Phi-4 and Mistral Nemo push quality and context length further. If you want one model for general use, pick from this row — it is the best balance of capability and hardware demand.
Large models — workstation and GPU territory
From 27B upwards you are into serious hardware. Gemma 2 27B and Qwen 2.5 32B want 32 GB or more; Mixtral 8x7B and the 70B-class models — Llama 3.3 70B and the DeepSeek-R1 70B distill — need 48 GB+ of fast memory, which in practice means a high-VRAM GPU or a high-memory Apple Silicon Mac. The reward is quality that approaches the big cloud models, running entirely on your own machine. See our best GPUs for AI guide for what actually runs these.
Specialised models: coding, vision and embeddings
Beyond general chat, Ollama hosts task-specific models. Qwen 2.5 Coder and Code Llama are built for programming and pair well with local IDE tools. LLaVA adds vision, so a model can describe or reason about images. And embedding models like nomic-embed-text and mxbai-embed-large do not chat at all — they turn text into vectors for search and retrieval-augmented generation, the backbone of a local RAG setup.
Which Ollama model should you actually use?
The honest answer is: the largest one your memory can hold in the class you need. For general use, start with an 8B model and move up only if quality falls short. For reasoning, try a DeepSeek-R1 distill; for coding, Qwen 2.5 Coder; for images, LLaVA. We rank the best picks by use case in the best local LLMs to run on Ollama, and compare Ollama itself with the alternatives in Ollama vs LM Studio vs vLLM vs llama.cpp.
Check a model fits before you download
The single most common mistake is pulling a model too big for your machine — it will either refuse to load or crawl as it swaps to disk. Before downloading, size it up: as a rough rule a 4-bit model needs a little under 1 GB of memory per billion parameters, plus headroom for context. Our free VRAM calculator gives the exact figure for any model and quantisation, and Ollama’s system requirements explain the RAM-versus-VRAM trade-off in full.
Frequently asked questions
How do I list the models installed in Ollama? Run ollama list to see every installed model with its size, ollama ps for what is loaded now, and ollama rm <name> to delete one.
What is the best Ollama model? There is no single best — it depends on your memory. Llama 3.1 8B is the best all-round pick for 16 GB machines; see our ranked list for each use case.
How many models does Ollama have? Hundreds, across chat, coding, vision and embedding families, with multiple sizes each. The table above covers the ones most people actually run.
How much RAM do I need to run Ollama models? 8 GB runs small (1B–4B) models, 16 GB runs the popular 7B–8B class, and 32 GB+ or a GPU is needed for 27B and larger. Check any model with our VRAM calculator.
Can I run these models offline? Yes — once pulled, every Ollama model runs entirely on your machine with no internet connection, which is the main reason to use local models at all.
The bottom line
The Ollama models list is long, but choosing is simple: decide what you need — general chat, reasoning, coding, vision or embeddings — then pick the biggest model in that family your memory can hold. Start small with an 8B model, use ollama list to keep track of what you have, and lean on the VRAM calculator before every download so you never pull something your machine cannot run. From there, running capable AI locally and privately is a couple of commands away.
Model names, sizes and availability change frequently; figures are approximate defaults current as of mid-2026 — verify with ollama list and the official library before relying on them.
