Monday, 22 June 2026 | Updating Daily AI insight, written for builders

How to Install Ollama in 2026: Mac, Windows, and Linux (Step by Step)

Installing Ollama is genuinely a two-minute job on every major operating system. This guide gives you the exact steps for Mac, Windows, and Linux, shows you how to run your first model, and covers the handful of errors people actually run into.

New to the tool entirely? Start with what Ollama is and how it works, then come back here to install it.

Conclusiones clave

  • Mac: download the app from ollama.com, or brew install ollama.
  • Windows: download and run the official installer — native, no WSL required.
  • Linux: one command — curl -fsSL https://ollama.com/install.sh | sh.
  • First model: ollama run gemma4 downloads and runs a strong all-rounder.
  • Check it works: the API answers at http://localhost:11434.

Before you install: can your machine run it?

Ollama itself is tiny, but the models are not. A quick rule of thumb: you want roughly as much free RAM (or VRAM) as the quantized model size — about 4–5 GB for a 7B model, 8 GB for a 13B model, and far more for the big ones. If you’re not sure what your hardware can handle, read our Guía de requisitos del sistema para Ollama first so you pick a model that actually fits.

Install on macOS

The easiest path is the native app:

  1. Go to ollama.com/download and download the macOS app.
  2. Open the .dmg and drag Ollama to Applications.
  3. Launch it — Ollama runs in the background and the ollama command becomes available in your terminal.

Prefer the command line? Use Homebrew:

brew install ollama

On Apple Silicon (M1–M5), Ollama automatically uses the GPU through Apple’s MLX backend (since v0.19), so you get fast inference with no extra configuration.

Install on Windows

Ollama runs natively on Windows — you no longer need WSL:

  1. Download the Windows installer from ollama.com/download.
  2. Run the .exe and follow the prompts.
  3. Open PowerShell or Command Prompt and type ollama --version to confirm it’s installed.

If you have an NVIDIA GPU, Ollama detects it automatically and uses CUDA. No driver gymnastics required, as long as your GPU drivers are current.

Install on Linux

One command does everything:

curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama and sets it up as a systemd service that starts on boot. To confirm it’s running:

systemctl status ollama

On Ubuntu and most distros, the installer detects NVIDIA and AMD GPUs and configures the right backend. For AMD cards specifically, make sure ROCm is installed — see our deep dive on ROCm frente a CUDA for the state of AMD support in 2026.

Run your first model

With Ollama installed, pull and run a model in one command:

ollama run gemma4

The first run downloads the model (a few gigabytes), then drops you into a chat prompt. Type a question, get an answer — entirely on your machine. Some useful commands:

  • ollama list — show models you’ve downloaded.
  • ollama pull qwen3 — download a model without running it.
  • ollama rm gemma4 — delete a model to reclaim disk space.
  • ollama ps — see what’s currently loaded in memory.

Not sure which model to start with? Our guide to the best local LLMs on Ollama matches models to use cases and hardware.

Verify the API is running

Ollama exposes a REST API on port 11434. To confirm it’s live, run:

curl http://localhost:11434/api/tags

A JSON response listing your models means everything works. This endpoint is what your own apps will talk to — and because Ollama offers an OpenAI-compatible API, a lot of existing code works by just changing the base URL.

Common install problems and fixes

  • “ollama: command not found” (Mac/Linux): the app installed but isn’t on your PATH. On Mac, make sure the app has been launched once; on Linux, open a new shell after install.
  • Model downloads are slow or stall: Ollama pulls large files; a stalled pull usually resolves with ollama pull <model> again — it resumes rather than restarting.
  • GPU not being used: check ollama ps — if it shows 100% CPU, your GPU drivers may be out of date or the model is too large to fit in VRAM and spilled to CPU. Try a smaller or more heavily quantized model.
  • “out of memory” errors: the model is bigger than your available RAM/VRAM. Pull a smaller quant (look for q4 variants) or a smaller model size. Our system requirements guide shows what fits where.
  • Port 11434 already in use: another Ollama instance is running. Stop it (ollama ps then quit the app/service) before starting a new one.

Configure Ollama after install: storage, memory, and a UI

Installing Ollama is the easy part. A few settings decide whether it stays out of your way or quietly eats your disk and RAM. All of them are environment variables, and where you set them depends on your OS: on macOS use launchctl setenv, on Linux edit the service with systemctl edit ollama.service and add an Environment= line, and on Windows add a user environment variable in Settings. Restart Ollama after any change for it to take effect.

The four that matter most:

  • OLLAMA_MODELS — moves where models are stored. Models are large, and the defaults sit on your system drive (~/.ollama/models on macOS, /usr/share/ollama/.ollama/models on Linux, C:Users<you>.ollamamodels on Windows). Point this at a bigger or faster drive before you pull a stack of models. On Linux, make sure the ollama user can read and write the new directory.
  • OLLAMA_KEEP_ALIVE — how long a model stays in memory after a request. The default is 5 minutes. Set it to -1 to keep a model resident so the first prompt is never slow, or 0 to unload immediately and free VRAM the moment you are done.
  • OLLAMA_CONTEXT_LENGTH — the context window. Ollama defaults to a conservative 4096 tokens, so long documents get silently truncated. Raise it (for example to 8192 or higher) if your model supports it, and pair it with OLLAMA_FLASH_ATTENTION=1 to keep the extra memory cost down.
  • OLLAMA_HOST — what the server binds to. By default Ollama only listens on 127.0.0.1:11434, so nothing else on your network can reach it. Set it to 0.0.0.0:11434 to use Ollama from another machine, a phone, or a Docker container.

One serious caveat on that last one: the Ollama API has no built-in authentication. Binding to 0.0.0.0 means anyone who can reach port 11434 can run models on your hardware. Only do it on a trusted LAN, and put it behind a reverse proxy or firewall rule if the machine is exposed.

Finally, the command line is fine for testing but tedious for daily use. The common fix is Open WebUI, a self-hosted, ChatGPT-style interface that talks to your local Ollama. The quickest route is Docker: the container serves on port 8080 internally, so map it to a host port with -p 3000:8080 and open http://localhost:3000, then point it at your Ollama instance. You get chat history, model switching, and document uploads, all running entirely on your own machine.

Preguntas frecuentes

How do I install Ollama on Windows?

Download the native installer from ollama.com/download and run the .exe. Ollama runs natively on Windows with no WSL required, and automatically uses an NVIDIA GPU via CUDA if you have one. Confirm the install with ollama --version in PowerShell.

How do I install Ollama on Linux?

Run curl -fsSL https://ollama.com/install.sh | sh. This installs Ollama and registers it as a systemd service. Verify it with systemctl status ollama. The installer auto-detects NVIDIA and AMD GPUs.

Can I install Ollama with Homebrew?

Yes — brew install ollama works on macOS. The native app from ollama.com is equally good and includes a menu-bar presence; the Homebrew route is handy if you manage everything through the command line.

Where does Ollama store models?

By default, on Mac and Linux in ~/.ollama/models, and on Windows under your user profile. Models can be several gigabytes each, so use ollama list to track what you’ve downloaded and ollama rm <model> to clean up.

Is Ollama safe to install?

Yes. Ollama is open-source (MIT-licensed) and widely used. The standard caution applies to the Linux one-line installer — it’s the project’s official script, but if you prefer, you can download and inspect install.sh before running it.

Why does Ollama keep cutting off long prompts or documents?

Ollama defaults to a 4096-token context window regardless of what the model can actually handle, so longer inputs are truncated without warning. Raise it by starting the server with OLLAMA_CONTEXT_LENGTH set higher (for example 8192 or more), up to the model’s supported limit. A bigger context uses more memory, so enable OLLAMA_FLASH_ATTENTION=1 to offset the cost, and watch your RAM or VRAM headroom.

How do I access Ollama from another computer on my network?

By default Ollama only listens on localhost, so it is invisible to other devices. Set OLLAMA_HOST=0.0.0.0:11434 and restart it, then connect using the host machine’s LAN IP address on port 11434. Be aware there is no authentication on the API, so only expose it on a network you trust, and use a firewall rule or reverse proxy if the machine is reachable from outside your home or office.

How do I stop Ollama from reloading the model on every request?

That delay is the model being unloaded from memory between uses. Ollama keeps a model loaded for 5 minutes by default; set OLLAMA_KEEP_ALIVE=-1 to keep it resident indefinitely so prompts respond instantly. The trade-off is that the model holds onto your RAM or VRAM the whole time, so use 0 instead if you would rather free that memory the moment a request finishes.

Conclusión

On any operating system, installing Ollama is a download-and-run affair that takes about two minutes, and your first local model is one command away. Pick a model that fits your hardware, confirm the API answers on port 11434, and you’ve got a private, free LLM running on your own machine. From here, explore which models to run y how much hardware each one needs.

Scroll to Top