Local LLM inference on M5 MacBook Pro costs 3x more than OpenRouter
A cost breakdown of running Gemma 4 31B locally on a maxed-out M5 Max MacBook Pro shows hardware depreciation, not electricity, drives the economics. At $0.18/kWh, power runs roughly two cents an hour, but amortizing the $4,299 machine over a realistic 3-10 year lifespan adds $0.05-$0.16 per hour. Combined with measured throughput of 10-40 tokens per second, that works out to roughly $0.40-$4.79 per million tokens, with $1.50 as a reasonable central estimate.
OpenRouter offers the same model at $0.38-$0.50 per million tokens and delivers 60-70 tokens per second from the faster providers, making it roughly three times cheaper and several times faster than local inference. The author concludes that for any knowledge worker whose salary dwarfs token spend by three orders of magnitude, paying a cloud provider is the obvious choice. The genuinely notable finding is not the cost gap but that a consumer laptop can now run a model approaching Claude Sonnet quality at all.
Read the full article
Continue reading at Hacker News →This is an AI-generated summary. Read the original for the full story.