Stop guessing which models fit in your VRAM!
llmfit is a CLI tool that auto-detects your hardware and ranks 206 models by what actually runs on your system.
You download a 70B model and hope it fits. Or you estimate memory requirements across quantization levels and still end up with models that crash or run too slow.
llmfit changes that. It detects your CPU, RAM, GPU, and VRAM, then scores every model in its database against your hardware.
Instead of assuming one quantization level, it tries the best quality that fits. Starts with Q8_0, walks down to Q2_K if needed. If nothing fits at full context, it tries half context. You get the highest quality model that actually works.
Each model gets scored on Quality, Speed, Context, and Capability. The weights shift based on what you're doing. Chat models prioritize speed, reasoning models prioritize quality.
Run it as an interactive TUI to browse models, use CLI mode for a quick table, or get JSON output for scripts. There's a REST API for cluster schedulers.
You can also run it in reverse. Give it a model you want to run and target performance, it tells you what hardware you need.
The real value: you see ranked options before downloading anything. No more burning bandwidth on 50GB models that won't run.
It's 100% open source.
Link to llmfit in comments!
显示更多