Installing Ollama is genuinely a two-minute job on every major operating system. This guide gives you the exact steps for Mac, Windows, and Linux, shows you how to run your first model, and covers the handful of errors people actually run into.
New to the tool entirely? Start with what Ollama is and how it works, then come back here to install it.
Principaux enseignements
- Mac: download the app from ollama.com, or
brew install ollama. - Windows: download and run the official installer — native, no WSL required.
- Linux: one command —
curl -fsSL https://ollama.com/install.sh | sh. - First model:
ollama run gemma4downloads and runs a strong all-rounder. - Check it works: the API answers at
http://localhost:11434.
Before you install: can your machine run it?
Ollama itself is tiny, but the models are not. A quick rule of thumb: you want roughly as much free RAM (or VRAM) as the quantized model size — about 4–5 GB for a 7B model, 8 GB for a 13B model, and far more for the big ones. If you’re not sure what your hardware can handle, read our Ollama system requirements guide first so you pick a model that actually fits.
Install on macOS
The easiest path is the native app:
- Go to ollama.com/download and download the macOS app.
- Open the
.dmgand drag Ollama to Applications. - Launch it — Ollama runs in the background and the
ollamacommand becomes available in your terminal.
Prefer the command line? Use Homebrew:
brew install ollama
On Apple Silicon (M1–M5), Ollama automatically uses the GPU through Apple’s MLX backend (since v0.19), so you get fast inference with no extra configuration.
Install on Windows
Ollama runs natively on Windows — you no longer need WSL:
- Download the Windows installer from ollama.com/download.
- Run the
.exeand follow the prompts. - Open PowerShell ou Command Prompt and type
ollama --versionto confirm it’s installed.
If you have an NVIDIA GPU, Ollama detects it automatically and uses CUDA. No driver gymnastics required, as long as your GPU drivers are current.
Install on Linux
One command does everything:
curl -fsSL https://ollama.com/install.sh | sh
This installs Ollama and sets it up as a systemd service that starts on boot. To confirm it’s running:
systemctl status ollama
On Ubuntu and most distros, the installer detects NVIDIA and AMD GPUs and configures the right backend. For AMD cards specifically, make sure ROCm is installed — see our deep dive on ROCm vs CUDA for the state of AMD support in 2026.
Run your first model
With Ollama installed, pull and run a model in one command:
ollama run gemma4
The first run downloads the model (a few gigabytes), then drops you into a chat prompt. Type a question, get an answer — entirely on your machine. Some useful commands:
ollama list— show models you’ve downloaded.ollama pull qwen3— download a model without running it.ollama rm gemma4— delete a model to reclaim disk space.ollama ps— see what’s currently loaded in memory.
Not sure which model to start with? Our guide to the best local LLMs on Ollama matches models to use cases and hardware.
Verify the API is running
Ollama exposes a REST API on port 11434. To confirm it’s live, run:
curl http://localhost:11434/api/tags
A JSON response listing your models means everything works. This endpoint is what your own apps will talk to — and because Ollama offers an OpenAI-compatible API, a lot of existing code works by just changing the base URL.
Common install problems and fixes
- “ollama: command not found” (Mac/Linux): the app installed but isn’t on your
PATH. On Mac, make sure the app has been launched once; on Linux, open a new shell after install. - Model downloads are slow or stall: Ollama pulls large files; a stalled pull usually resolves with
ollama pull <model>again — it resumes rather than restarting. - GPU not being used: check
ollama ps— if it shows 100% CPU, your GPU drivers may be out of date or the model is too large to fit in VRAM and spilled to CPU. Try a smaller or more heavily quantized model. - “out of memory” errors: the model is bigger than your available RAM/VRAM. Pull a smaller quant (look for
q4variants) or a smaller model size. Our system requirements guide shows what fits where. - Port 11434 already in use: another Ollama instance is running. Stop it (
ollama psthen quit the app/service) before starting a new one.
FAQ
How do I install Ollama on Windows?
Download the native installer from ollama.com/download and run the .exe. Ollama runs natively on Windows with no WSL required, and automatically uses an NVIDIA GPU via CUDA if you have one. Confirm the install with ollama --version in PowerShell.
How do I install Ollama on Linux?
Run curl -fsSL https://ollama.com/install.sh | sh. This installs Ollama and registers it as a systemd service. Verify it with systemctl status ollama. The installer auto-detects NVIDIA and AMD GPUs.
Can I install Ollama with Homebrew?
Yes — brew install ollama works on macOS. The native app from ollama.com is equally good and includes a menu-bar presence; the Homebrew route is handy if you manage everything through the command line.
Where does Ollama store models?
By default, on Mac and Linux in ~/.ollama/models, and on Windows under your user profile. Models can be several gigabytes each, so use ollama list to track what you’ve downloaded and ollama rm <model> to clean up.
Is Ollama safe to install?
Yes. Ollama is open-source (MIT-licensed) and widely used. The standard caution applies to the Linux one-line installer — it’s the project’s official script, but if you prefer, you can download and inspect install.sh before running it.
Résultat
On any operating system, installing Ollama is a download-and-run affair that takes about two minutes, and your first local model is one command away. Pick a model that fits your hardware, confirm the API answers on port 11434, and you’ve got a private, free LLM running on your own machine. From here, explore which models to run et how much hardware each one needs.
