Documentation

Choosing and managing models

LlamaBoss runs open AI models that you download once and keep on your own PC. This guide covers picking a model that fits your hardware, what the sizes mean, and where everything lives.

The model downloader

Open the model menu — the pill at the top of the window — and choose Download models. You'll see a curated list, ordered from lightest to most capable. Every model is free, openly licensed, and downloads straight from its public source: no account or sign-up is required.

Pick one, press Download, and watch the progress bar. You can cancel mid-download and retry later; finished models show a checkmark. If it's your first time, LlamaBoss highlights a recommended starter that runs on nearly any modern PC.

What fits your hardware

Bigger models give better answers but need more memory — ideally video memory (VRAM) on your graphics card. As a rule of thumb, you want a little more VRAM than the model's file size. The curated list spans the whole range:

ModelSizeRuns well on
Gemma 3 1B0.8 GBAlmost anything — integrated graphics, older laptops.
Llama 3.2 3B2.0 GBAny modern PC, even CPU-only. The recommended starting point.
Gemma 4 E2B vision3.1 GBEntry-level GPUs (4 GB VRAM) and recent laptops.
Gemma 4 E4B vision5.0 GBMid-range GPUs (6–8 GB VRAM). The best pick for most people.
gpt-oss 20B12.1 GBGaming GPUs with 13+ GB VRAM. Strong at reasoning, math, and code.
Gemma 4 26B A4B vision16.0 GB16 GB+ VRAM cards. Big-model knowledge at small-model speed.
Qwen 3.6 27B vision17.5 GB20 GB+ VRAM (RTX 4090 / 5090 class). Frontier-quality answers.
Gemma 4 31B vision19.6 GB20 GB+ VRAM. The highest-quality option in the list.

Not sure? Start small. A model that responds quickly is more useful than one that barely fits — and you can download a bigger one any time.

Why sizes vary

Models in the list are quantized — compressed so they take a fraction of their original memory with very little quality loss. LlamaBoss's curated downloads use a well-balanced compression level that works on every kind of GPU, so you never have to think about it. The file size you see in the list is roughly the memory the model wants while running.

Vision models

Models marked as vision-capable can look at images: drop a screenshot or photo into the chat and ask about it. Vision models need a small companion file to process images — LlamaBoss downloads it automatically right after the main model and keeps the pair together, so vision simply works out of the box.

Where models are stored

By default, models live in C:\Users\<you>\AppData\Local\LlamaBoss\models, each in its own folder together with its vision companion if it has one. They stay there until you delete them — downloads are one-time.

If you point LlamaBoss at your own models folder in Settings, downloads are saved loose in that folder instead, so they fit into however you already organize your collection.

Bringing your own model

Already have .gguf model files? Place them in your models folder and they'll appear in the model menu alongside the downloaded ones. Any model that works with llama.cpp works with LlamaBoss.