{"id":790,"date":"2026-06-06T01:59:14","date_gmt":"2026-06-06T01:59:14","guid":{"rendered":"https:\/\/convly.ai\/ollama-system-requirements-2026\/"},"modified":"2026-06-06T01:59:14","modified_gmt":"2026-06-06T01:59:14","slug":"ollama-system-requirements-2026","status":"publish","type":"post","link":"https:\/\/convly.ai\/fr\/ollama-system-requirements-2026\/","title":{"rendered":"Ollama System Requirements in 2026: How Much RAM and VRAM You Really Need"},"content":{"rendered":"<p>The single most common reason a model won&#8217;t run in Ollama isn&#8217;t a bug \u2014 it&#8217;s that the model is bigger than your memory. Ollama itself is tiny; the models are what demand hardware. This guide gives you the real RAM and VRAM numbers for each model size in 2026, plus a simple formula so you know what fits <em>before<\/em> you spend ten minutes downloading something that won&#8217;t load.<\/p>\n<p>If you haven&#8217;t installed Ollama yet, start with our <a href=\"https:\/\/convly.ai\/fr\/how-to-install-ollama-2026\/\">step-by-step install guide<\/a>.<\/p>\n<div class=\"convly-tldr\">\n<h3>Principaux enseignements<\/h3>\n<ul>\n<li><strong>The rule of thumb:<\/strong> a quantized (Q4) model needs roughly <strong>0.6 GB of memory per billion parameters<\/strong>, plus headroom for context.<\/li>\n<li><strong>2\u20133B models:<\/strong> run on CPU, ~2\u20134 GB RAM. Fine on a basic laptop.<\/li>\n<li><strong>7\u20138B models:<\/strong> ~6\u20138 GB RAM\/VRAM. The sweet spot for most laptops.<\/li>\n<li><strong>27\u201334B models:<\/strong> ~20\u201324 GB VRAM. Needs a high-end GPU or Apple Silicon with lots of unified memory.<\/li>\n<li><strong>70B+ models:<\/strong> 40 GB+ \u2014 a workstation GPU, multi-GPU rig, or 64 GB+ unified memory.<\/li>\n<\/ul>\n<\/div>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-flat ez-toc-counter ez-toc-container-direction\">\n<label for=\"ez-toc-cssicon-toggle-item-6a23c787c1f14\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #000000;color:#000000\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #000000;color:#000000\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a23c787c1f14\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/convly.ai\/fr\/ollama-system-requirements-2026\/#Why_memory_is_the_whole_story\" >Why memory is the whole story<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/convly.ai\/fr\/ollama-system-requirements-2026\/#The_simple_formula\" >The simple formula<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/convly.ai\/fr\/ollama-system-requirements-2026\/#Requirements_by_model_size\" >Requirements by model size<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/convly.ai\/fr\/ollama-system-requirements-2026\/#GPU_vs_CPU_vs_Apple_Silicon\" >GPU vs CPU vs Apple Silicon<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/convly.ai\/fr\/ollama-system-requirements-2026\/#How_to_make_a_big_model_fit\" >How to make a big model fit<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/convly.ai\/fr\/ollama-system-requirements-2026\/#FAQ\" >FAQ<\/a><\/li><li class='ez-toc-page-1'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/convly.ai\/fr\/ollama-system-requirements-2026\/#Bottom_line\" >R\u00e9sultat<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Why_memory_is_the_whole_story\"><\/span>Why memory is the whole story<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To generate text, a model&#8217;s weights have to sit in fast memory \u2014 your GPU&#8217;s VRAM, or system RAM if you&#8217;re running on CPU. If the model doesn&#8217;t fit, one of two things happens: Ollama spills part of it to slower memory (and performance collapses), or it refuses to load with an out-of-memory error. Everything else \u2014 CPU speed, disk, OS \u2014 matters far less than having enough of the right memory.<\/p>\n<p>Two factors set the requirement:<\/p>\n<ol>\n<li><strong>Parameter count<\/strong> \u2014 a 7B model has 7 billion weights; a 70B model has ten times as many.<\/li>\n<li><strong>Quantification<\/strong> \u2014 Ollama uses compressed GGUF weights. A 4-bit (Q4) quant cuts memory roughly in half versus 8-bit, with minimal quality loss, which is why it&#8217;s the default sweet spot.<\/li>\n<\/ol>\n<h2><span class=\"ez-toc-section\" id=\"The_simple_formula\"><\/span>The simple formula<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>For a 4-bit quantized model \u2014 what Ollama pulls by default \u2014 estimate:<\/p>\n<blockquote>\n<p><strong>Memory needed \u2248 (parameters in billions) \u00d7 0.6 GB + context overhead<\/strong><\/p>\n<\/blockquote>\n<p>So a 7B model needs roughly 4\u20135 GB, a 13B model about 8 GB, a 27B model around 18\u201320 GB, and a 70B model 40 GB or more. Add a bit on top for the KV cache, which grows with how long your conversations get. Always leave a few gigabytes of headroom for your operating system.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Requirements_by_model_size\"><\/span>Requirements by model size<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<table class=\"convly-vs\">\n<thead>\n<tr>\n<th>Model size<\/th>\n<th>Memory (Q4)<\/th>\n<th>Runs on<\/th>\n<th>Example models<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>2\u20133B<\/td>\n<td>~2\u20134 GB<\/td>\n<td>CPU \/ any laptop<\/td>\n<td>Gemma2 2B, Phi-4 mini<\/td>\n<\/tr>\n<tr>\n<td>7\u20138B<\/td>\n<td>~6\u20138 GB<\/td>\n<td>Entry GPU \/ 16 GB laptop<\/td>\n<td>DeepSeek-R1 7B, Llama 3.3 8B<\/td>\n<\/tr>\n<tr>\n<td>13\u201314B<\/td>\n<td>~10\u201312 GB<\/td>\n<td>Mid-range GPU<\/td>\n<td>Phi-4, mid Qwen<\/td>\n<\/tr>\n<tr>\n<td>27\u201334B<\/td>\n<td>~18\u201324 GB<\/td>\n<td>High-end GPU \/ Apple Silicon<\/td>\n<td>Gemma 4 26B, Qwen 3.6 27B<\/td>\n<\/tr>\n<tr>\n<td>70B<\/td>\n<td>~40\u201348 GB<\/td>\n<td>Workstation \/ multi-GPU<\/td>\n<td>Llama 70B class<\/td>\n<\/tr>\n<tr>\n<td>200B+ (MoE)<\/td>\n<td>100 GB+<\/td>\n<td>Server \/ huge unified memory<\/td>\n<td>Qwen3 235B-A22B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>For a deeper breakdown across specific models, see our guide to <a href=\"https:\/\/convly.ai\/fr\/vram-requirements-every-major-llm-2026\/\">Exigences en mati\u00e8re de VRAM pour tous les principaux programmes d'\u00e9ducation et de formation tout au long de la vie<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"GPU_vs_CPU_vs_Apple_Silicon\"><\/span>GPU vs CPU vs Apple Silicon<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><strong>NVIDIA GPU<\/strong> \u2014 the gold standard. VRAM is the hard limit: the model must fit in your card&#8217;s memory to run fast. A 24 GB card (RTX 4090\/5090) comfortably runs up to ~27\u201334B models.<\/p>\n<p><strong>CPU only<\/strong> \u2014 works for small models (2\u20138B) but is much slower, since system RAM bandwidth can&#8217;t match a GPU. Perfectly fine for light tasks on a laptop with no discrete GPU.<\/p>\n<p><strong>Apple Silicon<\/strong> \u2014 a special case, and a strong one. Because Macs use <em>m\u00e9moire unifi\u00e9e<\/em> shared between CPU and GPU, a Mac with 64 GB can load models that would need an expensive multi-GPU PC. Since Ollama v0.19 (March 2026) added the MLX backend, Apple Silicon also got much faster \u2014 making a high-memory Mac one of the best single-box local-LLM machines you can buy. For how that stacks up against a discrete GPU, see <a href=\"https:\/\/convly.ai\/fr\/amd-strix-halo-vs-apple-m4-pro\/\">Strix Halo vs Apple M4 Pro<\/a>.<\/p>\n<p><strong>AMD GPU<\/strong> \u2014 supported via ROCm. It works well for inference in 2026; check our <a href=\"https:\/\/convly.ai\/fr\/amd-rocm-vs-nvidia-cuda-2026\/\">ROCm vs CUDA breakdown<\/a> for the current state.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"How_to_make_a_big_model_fit\"><\/span>How to make a big model fit<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If the model you want is just over your memory, you have options before giving up:<\/p>\n<ul>\n<li><strong>Use a smaller quant<\/strong> \u2014 pull a <code>q4<\/code> or even <code>q3<\/code> variant instead of <code>q8<\/code>. You trade a little quality for a big memory saving.<\/li>\n<li><strong>Pick a smaller model size<\/strong> \u2014 a well-chosen 8B often beats a barely-running, swapped-out 27B.<\/li>\n<li><strong>Shorten the context window<\/strong> \u2014 a smaller context uses less KV-cache memory.<\/li>\n<li><strong>Close other apps<\/strong> \u2014 on a CPU\/unified-memory machine, free RAM is your budget.<\/li>\n<\/ul>\n<p>To pick a model matched to your hardware, see the <a href=\"https:\/\/convly.ai\/fr\/best-local-llms-to-run-on-ollama-2026\/\">best local LLMs to run on Ollama<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"FAQ\"><\/span>FAQ<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3>How much RAM do I need to run Ollama?<\/h3>\n<p>It depends entirely on the model. Ollama itself needs almost nothing; the model sets the requirement. As a rule, a 4-bit model needs about 0.6 GB per billion parameters \u2014 so ~4\u20135 GB for a 7B model, ~8 GB for 13B, and 40 GB+ for a 70B. Always leave a few gigabytes free for your OS.<\/p>\n<h3>Can I run Ollama without a GPU?<\/h3>\n<p>Yes. Small models (2\u20138B) run fine on CPU, just more slowly than on a GPU. A model like Gemma2 2B needs only about 1.7 GB of RAM and works on basic laptops. For models above ~13B, a GPU or Apple Silicon with unified memory makes a real difference.<\/p>\n<h3>How much VRAM do I need for a 7B model?<\/h3>\n<p>About 6\u20138 GB for a 4-bit quantized 7B model, including some context overhead. That fits comfortably on most entry-level discrete GPUs and on laptops with 16 GB of unified or system memory.<\/p>\n<h3>Why is Ollama running so slowly?<\/h3>\n<p>Almost always because the model doesn&#8217;t fully fit in your GPU&#8217;s VRAM, so part of it spilled to system RAM or CPU. Check with <code>ollama ps<\/code> \u2014 if it shows high CPU usage, switch to a smaller model or a more aggressive quant so the whole model fits in fast memory.<\/p>\n<h3>Is a Mac good for running Ollama?<\/h3>\n<p>Yes, often excellent. Apple Silicon&#8217;s unified memory lets a 64 GB Mac run models that would otherwise need a costly multi-GPU PC, and the MLX backend (since v0.19) made it fast too. A high-memory Mac is one of the best single-machine options for local LLMs in 2026.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Bottom_line\"><\/span>R\u00e9sultat<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before you download anything, do the quick math: parameters \u00d7 0.6 GB for a 4-bit model, plus headroom. Match that to your VRAM (NVIDIA\/AMD) or unified memory (Apple), and you&#8217;ll never hit a frustrating out-of-memory error again. When in doubt, start one size smaller than you think \u2014 a model that fits and runs fast beats a bigger one that crawls.<\/p>","protected":false},"excerpt":{"rendered":"<p>The number-one reason a model fails to run isn&#8217;t a bug \u2014 it&#8217;s memory. Here&#8217;s exactly how much RAM and VRAM each Ollama model size needs, and a formula to know before you download.<\/p>","protected":false},"author":1,"featured_media":796,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[3],"tags":[642,643,640,644,641,639],"class_list":["post-790","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-llms","tag-local-llm-vram","tag-ollama-gpu","tag-ollama-hardware-requirements","tag-ollama-ram","tag-ollama-requirements","tag-ollama-system-requirements"],"_links":{"self":[{"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/posts\/790","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/comments?post=790"}],"version-history":[{"count":0,"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/posts\/790\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/media\/796"}],"wp:attachment":[{"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/media?parent=790"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/categories?post=790"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/convly.ai\/fr\/wp-json\/wp\/v2\/tags?post=790"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}