Choosing and managing models
LlamaBoss runs open AI models that you download once and keep on your own PC. This guide covers picking a model that fits your hardware, what the sizes mean, and where everything lives.
- The model downloader
- What fits your hardware
- Why sizes vary
- Vision models
- Where models are stored
- Bringing your own model
The model downloader
Open the model menu — the pill at the top of the window — and choose Download models. You'll see a curated list, ordered from lightest to most capable. Every model is free, openly licensed, and downloads straight from its public source: no account or sign-up is required.
Pick one, press Download, and watch the progress bar. You can cancel mid-download and retry later; finished models show a checkmark. If it's your first time, LlamaBoss highlights a recommended starter that runs on nearly any modern PC.
What fits your hardware
Bigger models give better answers but need more memory — ideally video memory (VRAM) on your graphics card. As a rule of thumb, you want a little more VRAM than the model's file size. The curated list spans the whole range:
| Model | Size | Runs well on |
|---|---|---|
| Gemma 3 1B | 0.8 GB | Almost anything — integrated graphics, older laptops. |
| Llama 3.2 3B | 2.0 GB | Any modern PC, even CPU-only. The recommended starting point. |
| Gemma 4 E2B vision | 3.1 GB | Entry-level GPUs (4 GB VRAM) and recent laptops. |
| Gemma 4 E4B vision | 5.0 GB | Mid-range GPUs (6–8 GB VRAM). The best pick for most people. |
| gpt-oss 20B | 12.1 GB | Gaming GPUs with 13+ GB VRAM. Strong at reasoning, math, and code. |
| Gemma 4 26B A4B vision | 16.0 GB | 16 GB+ VRAM cards. Big-model knowledge at small-model speed. |
| Qwen 3.6 27B vision | 17.5 GB | 20 GB+ VRAM (RTX 4090 / 5090 class). Frontier-quality answers. |
| Gemma 4 31B vision | 19.6 GB | 20 GB+ VRAM. The highest-quality option in the list. |
Not sure? Start small. A model that responds quickly is more useful than one that barely fits — and you can download a bigger one any time.
Why sizes vary
Models in the list are quantized — compressed so they take a fraction of their original memory with very little quality loss. LlamaBoss's curated downloads use a well-balanced compression level that works on every kind of GPU, so you never have to think about it. The file size you see in the list is roughly the memory the model wants while running.
Vision models
Models marked as vision-capable can look at images: drop a screenshot or photo into the chat and ask about it. Vision models need a small companion file to process images — LlamaBoss downloads it automatically right after the main model and keeps the pair together, so vision simply works out of the box.
Where models are stored
By default, models live in C:\Users\<you>\AppData\Local\LlamaBoss\models, each in its own folder together with its vision companion if it has one. They stay there until you delete them — downloads are one-time.
If you point LlamaBoss at your own models folder in Settings, downloads are saved loose in that folder instead, so they fit into however you already organize your collection.
Bringing your own model
Already have .gguf model files? Place them in your models folder and they'll appear in the model menu alongside the downloaded ones. Any model that works with llama.cpp works with LlamaBoss.