LLM 0.32a0 refactors Python library around message sequences and typed streaming parts
Simon Willison has shipped an alpha of his LLM library that abandons its original prompt-in/text-out abstraction in favor of two richer primitives: prompts as ordered sequences of user/assistant messages, and responses as streams of typed parts. New llm.user() and llm.assistant() builders let callers pass a messages=[] array directly into model.prompt(), finally making it straightforward to seed a conversation from prior history — something the old conversation() API only handled when built up turn-by-turn, forcing the CLI to lean on a SQLite-backed workaround.
The streaming change is the more consequential one. Modern frontier models interleave reasoning tokens, text, tool calls, server-executed tool outputs, and increasingly images or audio in a single response. The new response.stream_events() and async astream_events() iterators yield discrete event objects tagged with a type (text, tool_call_name, tool_call_args, etc.), letting consumers route each part appropriately. The CLI uses this to render reasoning text in a separate color and push it to stderr so piped output stays clean, and response.execute_tool_calls() or response.reply() close the tool-use loop.
The refactor is backwards-compatible — the old prompt= argument is transparently wrapped into a single-item messages array — but it repositions LLM as a multi-modal, tool-aware abstraction layer rather than a text-completion wrapper. That matters for the plugin ecosystem: providers like the updated llm-anthropic can now expose reasoning streams as first-class events instead of cramming everything into a flat text channel.
Read the full article
Continue reading at Simon Willison →This is an AI-generated summary. Read the original for the full story.