Cohere Terrarium sandbox flaw lets AI-generated code escape the container as root

A vulnerability in Cohere’s Terrarium, the sandboxed Python execution environment used to run code produced by AI agents, allowed attackers to break out of the container and execute commands with root privileges on the underlying host. Terrarium is meant to be the isolation boundary that keeps untrusted, model-generated code from touching anything sensitive — so a working escape collapses the entire trust model that agentic systems built on top of it depend on.

The flaw is significant beyond Cohere’s own deployment because Terrarium-style sandboxes have become a default building block for tool-using LLM agents. Any platform that wires a code-execution tool into an agent loop is, in practice, accepting arbitrary input from an attacker who can prompt the model. If the sandbox layer is bypassable, prompt injection becomes remote code execution on shared infrastructure, with lateral movement, credential theft, and pivoting into the host environment all in scope.

The takeaway for anyone running agent code-execution at scale: container-as-sandbox is a weak boundary by itself. Defense needs layered isolation — gVisor, Firecracker, or full VM-per-execution — combined with strict egress controls, ephemeral filesystems, and the assumption that the model will eventually be coaxed into running adversarial payloads.