RC RANDOM CHAOS

March 2019 changed who reads binaries

Free disassemblers and decompilers changed who can audit binaries. The defender, attacker, and AI safety implications are now playing out in practice.

· 7 min read

A free decompiler changes who gets to look

Ghidra was released by the NSA in March 2019. Within a year, it was running in undergraduate malware analysis courses, on the laptops of independent researchers in countries that couldn’t afford IDA Pro’s $4,000-per-seat license, and inside threat intel teams at companies that previously outsourced reverse engineering because the tooling cost more than the analyst. The disassembler-decompiler stack went from a specialist’s tool to something a curious teenager could install on a Tuesday afternoon.

That shift matters more than the tool itself. When someone builds and open-sources a new disassembler and decompiler - and people do, regularly, with projects like radare2, Cutter, rizin, angr, and Binary Ninja’s free components - they aren’t just adding to a toolbox. They’re changing the population of people who can read a binary. The security implications follow from that, and so do the AI safety ones.

What a disassembler-decompiler stack actually does

A disassembler turns machine code into assembly. A decompiler tries to reconstruct something closer to C or another higher-level language. Together they let an analyst answer questions about software they don’t have source code for: what does this binary do, what does it talk to, what can be trusted, where can it be broken.

This is not exotic work. A typical workflow: load the binary, let the tool identify functions and cross-references, rename the meaningful ones, follow the strings, find the network calls, find the crypto routines, work outward from there. An experienced analyst can map the behavior of a 200KB binary in a day or two. With a decompiler producing readable pseudocode, that timeline shrinks. With LLMs that can summarize decompiled functions, it shrinks again.

The practical effect is that the cost of understanding any compiled artifact - a router firmware, a closed-source driver, a piece of malware, a model serving binary - has dropped by a factor of ten in roughly fifteen years. Most of that drop came from open tooling.

The defender benefits are concrete and measurable

Malware analysis is the obvious case. When a sample lands in a SOC, the analyst who can pull it open and identify the C2 protocol, the encryption routine, and the persistence mechanism in two hours is producing actionable intel. The analyst who has to ship it to a vendor and wait three days is producing a timeline.

There is a second case that gets less attention: third-party software assessment. Most organizations run hundreds of binaries they didn’t write and can’t fully audit - printer firmware, badge readers, HVAC controllers, medical devices. A free decompiler is the only practical way for a mid-sized hospital or a school district to verify that a device on their network does what the vendor claims. Not theoretically. Actually open it up and look.

Third, vulnerability research at the patch-diffing level. When Microsoft ships a Patch Tuesday update, researchers diff the old and new binaries to find what got fixed and infer what was broken. That work used to be the province of a few dozen specialists. Open tools have made it accessible to thousands. The window between patch release and exploit availability has shrunk, but so has the window between patch release and defenders understanding what they need to prioritize.

The attacker side is real and worth naming directly

The same tools that let a defender understand malware let an attacker understand a target. Open decompilers are used to find vulnerabilities in closed-source software, to bypass DRM, to remove license checks, to study commercial security products and learn how to evade them, and to lift proprietary algorithms out of binaries.

This is the dual-use problem in its plainest form. There is no version of these tools that helps defenders and refuses to help attackers. The capability is symmetric. The question is whether the defender population grows faster than the attacker population when the tools are free.

The historical answer, looking at the post-2019 Ghidra cohort, is that defenders gained more from the release than attackers did, because attackers already had access to commercial tools through piracy or shared lab licenses. The marginal new user of a free decompiler is overwhelmingly a student, a hobbyist, a small-team analyst, or a researcher - not a state-level adversary who was already tooled up. That math may not hold for every future release, but it held for the biggest one.

Where AI safety enters the picture

The AI safety angle is not hypothetical anymore. Three threads are tightening.

First, model weights are increasingly distributed as binaries or as files that load into compiled inference runtimes. Reverse engineering the runtime - llama.cpp, vLLM, TensorRT, custom serving stacks - is now a standard part of evaluating what a deployed model actually does versus what its operator claims. If a vendor says their model has safety filtering, an analyst with a decompiler can verify whether the filter runs before or after the generation, whether it can be bypassed by a flag, and whether the binary contains the filter at all.

Second, agent frameworks ship as compiled binaries with embedded prompts, embedded tool definitions, and embedded credentials. Decompilers extract those. This is how researchers have found hardcoded API keys, hidden system prompts, and undocumented tool access in commercial agent products. The disclosure cycle for those findings is now measured in weeks, not years, because the tooling is free.

Third, malicious models. A model checkpoint can carry executable code in its loader, in its tokenizer, in its custom kernels. The PyTorch pickle format has been a known code-execution vector for years. Identifying a poisoned checkpoint requires reading the binary structure of the file and the code paths it triggers when loaded. Open disassemblers are the entry point for that work, and the work is becoming routine as model sharing platforms grow.

What changes when an LLM sits on top of the decompiler

The newer development is reverse engineering tools that pipe decompiled output into a language model for summarization, function naming, and vulnerability hypothesis generation. Projects like Sidekick for Binary Ninja, the Ghidra LLM plugins, and several independent tools do this now.

This collapses the skill floor further. A junior analyst with an LLM-assisted decompiler can produce work in a week that used to take a senior analyst a month. The output quality is uneven - LLMs hallucinate function purposes, miss subtle control flow, and confidently mislabel obfuscated code - but the throughput gain is real for routine work.

For defenders, this means small teams can cover more ground. For AI safety researchers specifically, it means the audit surface for AI systems has become tractable. You can point an LLM-assisted decompiler at a model serving binary and get a first-pass behavioral map in an afternoon. Five years ago that was a multi-week project for someone who specialized in it.

What to do with this if you run security or build AI systems

If you run a security team and don’t have someone who can use Ghidra or a comparable tool, you are outsourcing a capability that has become free. Hire or train for it. The barrier is no longer cost; it’s willingness to invest in the skill. A motivated analyst can become productive in three to six months of part-time practice.

If you build or deploy AI systems, assume your binaries will be reverse engineered. Anything you embed - prompts, keys, filter logic, model identifiers - will be visible. Architect around that assumption. Server-side anything that needs to stay private. Don’t rely on obfuscation; it slows analysis by hours, not weeks.

If you publish models, document the loader code path. The community is going to audit it whether you document it or not, and the documented version produces better-informed audits.

If you teach security, the curriculum has changed. Reverse engineering is no longer a specialty track. It’s a literacy. The students graduating in 2027 should be able to open a binary the way the students graduating in 2015 were expected to read network traffic.

The tooling will keep getting better and cheaper

The trajectory is clear. Disassemblers will keep improving. Decompilers will produce cleaner output. LLM integration will get tighter. The cost of understanding a binary will keep falling. The population of people who can do this work will keep growing.

That is good for defense in aggregate. It is also good for offense in aggregate. The net direction depends on what defenders, vendors, and researchers do with the time the tooling gives them back. Free decompilers do not solve security problems. They remove an excuse - the excuse that the binary was too hard to read, the assessment was too expensive, the verification was someone else’s job. With that excuse gone, the remaining question is whether anyone bothers to look.

Most still don’t. That is the actual gap, and it is not a tooling gap anymore.

Share

Keep Reading

Stay in the loop

New writing delivered when it's ready. No schedule, no spam.