هل يمكنني تشغيل هذه النموذج اللغوي الكبير (LLM) محليًّا؟ آلة حاسبة مجانية لسعة الذاكرة المخصصة للوحدة الرسومية (VRAM) والوحدة الرسومية (GPU) (2026)

تتساءل إن كانت وحدة معالجة الرسومات الخاصة بك قادرة على تشغيل نموذج لغوي كبير معين محليًّا؟ توفر لك هذه الآلة الحاسبة المجانية تقديرًا للكمية المطلوبة من الـ VRAM لكل نموذج عند مستويات التكميم المختلفة — وتُبيّن لك بدقة أي النماذج يمكنها العمل على وحدتك.

النموذج عدد المعاملات (بالمليارات) وحدة معالجة الرسومات الخاصة بك السياق

التكمين	الجودة	السعة التقديرية للـ VRAM	هل يناسب وحدة معالجة الرسومات الخاصة بك؟

التقدير مبني على أوزان النموذج عند كل مستوى تكميم، بالإضافة إلى عبء تشغيلي تقريبي قدره ~1.5 جيجابايت. أما السياقات الطويلة فهي تضيف ذاكرة التخزين المؤقت للقيم المفتاحية (KV-cache) فوق ذلك (وتزداد هذه الذاكرة مع طول السياق الذي تختاره). وفي نماذج «الخلط بين الخبراء» (Mixture-of-Experts)، يجب تحميل جميع المعايير إلى ذاكرة VRAM حتى وإن استُخدمت فقط مجموعة فرعية منها في حساب كل رمز.

اختر نموذجًا من قائمتنا قاعدة بيانات نماذج الذكاء الاصطناعي (أو أدخل عدد المعاملات المخصص)، واختر وحدة معالجة الرسوميات (GPU) الخاصة بك، وعيّن طول السياق المطلوب، ثم اعرف فورًا ما إذا كان النموذج سيعمل أم لا — وبأي جودة. والتقديرات تعتمد على أوزان النموذج؛ أما السياقات الطويلة فتضيف ذاكرة التخزين المؤقت للقيم المفتاحية (KV-cache) فوق ذلك.

Quick answer: how much VRAM do you need to run an LLM locally?

To run a large language model locally at 4-bit quantisation, budget roughly 0.5–0.6 GB of GPU VRAM per billion parameters, plus 1–3 GB for the KV cache at typical context lengths. In practice an 8B model fits in about 5–6 GB (comfortable on a 12GB RTX 3060; also runs on an 8GB RTX 4060), a 13B model needs ~8–10 GB, a 32B model wants a 24GB card like the RTX 4090 (~20 GB of weights), and a 70B model needs about 40–48 GB — a single 48GB card or two 24GB GPUs. At 8-bit those figures roughly double (~1–1.2 GB per billion) and at full fp16 they roughly quadruple (~2 GB per billion).

Quick rules of thumb for the model weights, before adding context:

4-bit (Q4): ~0.5–0.6 GB per billion parameters
8-bit (Q8): ~1–1.2 GB per billion parameters
fp16 / bf16: ~2 GB per billion parameters
KV cache (context): add ~1–4 GB at 8K–32K tokens, more for very long context or larger models

الأسئلة الشائعة

How much VRAM do I need to run a 70B model?

A 70B model needs roughly 40–48 GB of VRAM at 4-bit quantisation — about 0.5–0.6 GB per billion parameters for the weights, plus the KV cache. That fits on a single 48GB card or two 24GB GPUs such as a pair of RTX 4090s. At 8-bit it roughly doubles to ~80 GB, and at fp16 you need around 140 GB.

Can my RTX 4060 or a 12GB GPU run an 8B or 7B model?

Yes. A 7B–8B model at 4-bit uses only about 4–5 GB of VRAM, so it runs comfortably on a 12GB card such as the RTX 3060, with plenty of headroom left for context. The RTX 4060 is an 8GB card, and it still handles a 7–8B model fine — you just have less room for long context windows.

How much VRAM does a 13B model need?

A 13B model needs about 8–10 GB of VRAM at 4-bit quantisation. It fits on a 12GB GPU for short-to-moderate context, but a 16GB card is safer once you add a longer context window and framework overhead.

Does a 24GB GPU like the RTX 4090 run a 32B or 70B model?

A 24GB GPU runs a 32B model well at 4-bit: the weights take about 20 GB, leaving a few GB for the KV cache. It cannot fit a 70B model at 4-bit — that needs ~40–48 GB — so you would need a second GPU, CPU offloading, or a lower-quality 2–3-bit quant.

How much GPU memory does an LLM use per billion parameters?

As a rule of thumb, budget ~0.5–0.6 GB per billion parameters at 4-bit, ~1–1.2 GB at 8-bit, and ~2 GB at fp16/bf16. So a 30B model is roughly 16–18 GB at 4-bit, ~32 GB at 8-bit, and ~60 GB at fp16, before you add the KV cache.

How much extra VRAM does context length (the KV cache) use?

The KV cache grows linearly with context length and adds about 3 GB on an 8B model when going from 8K to 32K tokens (roughly 1 GB at 8K rising to about 4 GB at 32K), with larger models and longer contexts using more. Quantising the KV cache to 8-bit roughly halves that penalty, so always leave 1–4 GB of headroom on top of the model weights.

What model size can I run on 8GB or 16GB of VRAM?

On 8 GB of VRAM you can comfortably run 7–8B models at 4-bit; on 16 GB you can run up to ~13B comfortably and squeeze in a 20–24B model with a modest context. For a 32B model you want 24 GB, and for a 70B model around 48 GB.

More free tools from Convly

📊قاعدة بيانات نماذج الذكاء الاصطناعي30+ LLMs — specs, pricing & context, side by side.🏆LLM Leaderboard 2026Rank every model by intelligence, price & speed.💵حاسبة تكلفة واجهات برمجة تطبيقات الذكاء الاصطناعيCompare what each model costs you per month.⚖Self-Hosting vs APIBuy a GPU or pay per token? See the break-even.🧠Compare Models & GPUsEvery model paired with the GPU that runs it.🖼Image-to-PromptTurn any image into an editable AI prompt.