Qwen3.6-27B runs flagship coding benchmarks on a laptop in 16.8GB
Alibaba’s Qwen team released Qwen3.6-27B, a dense 27B-parameter open-weight model they claim beats their previous flagship Qwen3.5-397B-A17B across major coding benchmarks. The size difference is the story: the old MoE flagship weighs 807GB on Hugging Face, while the new dense model is 55.6GB full-precision and just 16.8GB in Unsloth’s Q4_K_M quantization.
Simon Willison ran the quantized build locally via llama-server from llama.cpp, using a community-supplied config with reasoning mode enabled and a 65K context window. The canonical ‘pelican riding a bicycle’ SVG test produced what he calls an outstanding result for a local model, generating 4,444 tokens at ~25.6 tokens/sec. A second SVG prompt ran at similar throughput.
The significance is the collapse in hardware requirements for frontier-grade coding assistance. If the benchmark claims hold, developers can now run flagship-tier agentic coding against a local quantized model on consumer hardware rather than paying for hosted inference or provisioning multi-GPU rigs for the MoE predecessor.
Read the full article
Continue reading at Simon Willison →This is an AI-generated summary. Read the original for the full story.