Mid-2024: a drunk LLM found a ksmbd kernel bug

A researcher fed a slightly broken prompt to an LLM and it found a remote out-of-bounds write in the Linux kernel’s ksmbd module.

That’s not marketing copy. In mid-2024 Sean Heelan published work showing that degrading prompt quality - what he called “getting the model drunk” - caused a Large Language Model to surface real, exploitable bugs in production kernel code. The bug class was an out-of-bounds write reachable over the network. The target was ksmbd, the in-kernel SMB3 server merged into Linux in 2021. The model wasn’t fine-tuned for vulnerability research. It was a general-purpose assistant being asked to audit C code, badly.

This matters because it changes the economics of finding kernel bugs. It also changes the economics of finding them before anyone else does. Both sides of that trade now have a cheaper tool.

What “drunk” actually means here

The technique is unintuitive. Normally you want a precise prompt: clear instructions, narrow scope, structured output. Heelan’s observation was that highly constrained prompts caused the model to converge on the same handful of “obvious” candidate bugs across runs. Most were false positives. The signal-to-noise ratio collapsed.

Loosening the prompt - adding ambiguity, removing structure, asking broader questions, running the same query many times with high temperature - made the model exhibit more variance. It would chase weirder paths. Some of those paths landed on real bugs that a structured audit missed.

This is the practical takeaway: when you’re using an LLM to find vulnerabilities, determinism is the enemy. You want the model to be a slightly unreliable junior auditor running a thousand times, not a precise tool running once. You then triage the output with something cheaper than another LLM - usually a human, or a sanitizer like KASAN, or a fuzzer seeded with the candidate input.

ksmbd is a useful case study

ksmbd is the right target to think about because it sits at an awkward intersection. It runs in kernel space, so a bug there is a kernel bug, not a userspace one. It speaks SMB, a protocol with decades of malformed-packet history. It is reachable over the network on any host that enables it. And it is relatively new code in the kernel tree, which means the long tail of grep-and-stare auditing hasn’t finished yet.

The bug Heelan’s pipeline found was an out-of-bounds write triggered by a crafted SMB message. The path from “attacker sends bytes” to “attacker corrupts kernel memory” was short. The bug had survived human review, the merge process, and ongoing fuzzing campaigns. An LLM running at high temperature, asked vague questions about a specific source file, flagged the code path that contained it.

The model did not write an exploit. It did not understand SMB. It looked at a few hundred lines of C, noticed a length check that didn’t match the subsequent write, and said so in plain English. A human confirmed. A patch landed.

The asymmetry this creates

Defenders and attackers now have access to the same technique. The question is who uses it harder.

The Linux kernel has roughly 30 million lines of code. Maybe 1 to 2 percent is reachable from the network on a default-ish install - network filesystems, the TCP/IP stack, Bluetooth, USB-over-IP, ksmbd, nfsd, iSCSI. Call it 300,000 to 600,000 lines of high-value attack surface. A determined attacker can rent enough GPU time to run an LLM audit across all of it for less than the cost of a single zero-day on the gray market. The output will be mostly garbage. A small percentage will be real.

Defenders have the same option, with two structural advantages: they can run the pipeline against pre-merge patches in CI, and they can wire the output into existing tooling (syzkaller corpora, KASAN-instrumented test runs, static analyzers). They also have a structural disadvantage: an attacker needs one bug, a defender needs to fix all of them.

The direction this points is clear. LLM-assisted vulnerability discovery is going to become a standard step in kernel auditing the same way fuzzing did between 2015 and 2020. It will not replace fuzzers, static analysis, or human review. It will sit alongside them and catch a different slice of bugs - the ones that require reading code in roughly the way a slightly tired human reads code.

What this does not do

It does not produce working exploits. The gap between “this looks like an OOB write” and “I can use this to get a shell on a remote box” is still significant. Modern kernel hardening - KASLR, SMEP, SMAP, KPTI, stack canaries, CFI - means you usually need to chain multiple primitives. LLMs are bad at this part. They can spot the bug. They cannot reliably build the weird machine on top of it.

It does not scale to closed-source targets without source. The technique relies on the model reading C and reasoning about it. Decompiled output works in principle but the noise floor rises fast. Heelan’s results are specific to the situation where the auditor has clean source.

It does not eliminate fuzzing. Fuzzers and LLMs find different bugs. Fuzzers find what crashes when you mutate inputs against running code. LLMs find what looks wrong when you read the code. The two categories overlap less than you’d think. The bug in ksmbd had been seen by fuzzers in the sense that fuzzers had executed nearby paths - but it required a specific message structure that the fuzzer’s grammar didn’t generate. The LLM read the code and noticed the missing check directly.

What to actually do with this

If you maintain kernel code or anything else security-sensitive, the practical move is to add an LLM audit pass to your review process. Run it with high temperature. Run it many times. Discard the duplicate “obvious” findings. Triage the rest against a sanitizer build. Expect a false positive rate above 90 percent. Budget time for triage, not for the runs themselves - the runs are cheap.

If you operate Linux systems with ksmbd enabled, the immediate move is to disable it unless you need it. The module is off by default in most distributions but on by default in a few embedded and NAS-focused builds. Check lsmod | grep ksmbd. If it’s loaded and you don’t run an SMB server, unload it. If you do run one, prefer Samba in userspace until the ksmbd attack surface has been audited more thoroughly. This is not about distrusting the maintainers - it’s about reading the bug pattern. New kernel-space network servers always have an early period of high bug density. ksmbd is in that period.

If you build security tooling, the open problem is triage. The bottleneck is no longer finding candidate bugs. It is filtering them. Whoever ships a good triage layer - something that takes 10,000 LLM-flagged code paths and returns the 50 worth a human’s time - captures most of the value here. Sanitizer-driven differential execution is one promising direction. Symbolic execution against the specific code path the model flagged is another. Both are old techniques the LLM phase makes newly useful.

The AI safety read

There is a tendency in AI safety discussions to frame everything as either alignment or capability. Vulnerability discovery is a useful counterexample because it’s neither, exactly. The model isn’t misaligned - it’s doing what it was asked. The capability isn’t novel - humans have been finding these bugs for decades. What’s new is the cost curve.

When a capability gets 100x cheaper, the question stops being “can it be done” and starts being “who does it first, and at what scale.” That’s where the security implications live. Not in the model having a secret plan. In the model being a competent-enough auditor that running it across the entire Linux source tree, every night, on every commit, is now affordable for a single person with a credit card.

That’s the world we’re in as of 2026. The defensive playbook hasn’t caught up. The offensive playbook is being written in private. The kernel maintainers, to their credit, are taking the technique seriously - recent ksmbd patches reference LLM-assisted findings in their commit messages. Watch that pattern spread. When it shows up in the network stack, the TLS implementations, and the container runtimes, the threat model for everyone running Linux changes by degrees, not all at once.

The drunk-LLM trick is two years old and already in the toolkit. The interesting question is what the sober ones will find next.