whichllm: CLI picks the best local LLM for your GPU using real benchmarks
Original source
Show HN: Find the best local LLM for your hardware, ranked by benchmarks
Hacker News →whichllm is a command-line tool that auto-detects a machine’s GPU, CPU, and RAM and ranks HuggingFace models that will actually run on it. Unlike size-based fit checkers, it sorts candidates by merged scores from LiveBench, Artificial Analysis, Aider, Chatbot Arena, and the Open LLM Leaderboard, so a newer 27B model can outrank a 32B that also fits. VRAM estimates account for weights, GQA KV cache, activations, and overhead, while throughput modeling factors in quantization, backend, MoE active-vs-total parameters, and unified-memory partial offload.
The ranking pipeline grades every benchmark by evidence quality — direct match, variant, base model, interpolated, or uploader-reported — and discounts accordingly. Cross-family inheritance is rejected when parameter counts diverge more than 2x, blocking small forks from inheriting their base model’s scores. Stale leaderboards are demoted along each model’s lineage so 2024 results cannot dominate current-generation ones, and the snapshot date is printed with every recommendation.
Beyond ranking, whichllm can simulate arbitrary GPUs for purchase planning, reverse-lookup hardware needed for a specific model, download and chat with the top pick via uv-managed environments, emit ready-to-run Python snippets, or stream JSON for shell pipelines feeding Ollama. It ships via uvx, Homebrew, and pip, and supports GGUF, AWQ, GPTQ, and FP16/BF16 backends.
Read the full article
Continue reading at Hacker News →This is an AI-generated summary. Read the original for the full story.