Stop counting findings

Opening position

Most pentest reports I review are padded. The executive summary lists a finding count. The finding count is the product. The product is sold by volume, not by exploitability. A report with forty findings reads as thorough. A report with four reads as incomplete. The buyer rewards the first. The buyer is wrong.

Padding is not an accident. It is a delivery model. Vulnerability scanner output is reformatted into prose. Informational findings are promoted to Low. Low findings are promoted to Medium when the engagement needs weight. The severity column is calibrated to client expectation, not to attack path. The report becomes a document about coverage. It stops being a document about risk.

I read these reports the way an operator reads them. I look for one thing. Can the finding be chained to access, persistence, or data egress. If it cannot, it is not a finding. It is a note. Notes belong in an appendix. They do not belong on the cover page next to the word Critical.

What actually failed

The reports I review contain findings that describe system behaviour without describing exploitation. TLS 1.0 enabled on an internal host. Server banner disclosure. Missing X-Frame-Options on a static marketing page. Cookie without HttpOnly on a session that does not store sensitive state. Each of these is presented as a discrete defect. None of them are tied to an account takeover path, a privilege boundary crossing, or a data access outcome. The finding exists. The impact is not confirmed.

The severity ratings do not survive scrutiny. CVSS base scores are pasted in without environmental adjustment. A Medium finding on an internet-exposed authentication endpoint sits next to a Medium finding on an isolated lab system. The two are not equivalent. The report treats them as equivalent because the scoring is mechanical. The operator reading the report cannot prioritise. Everything Medium becomes nothing Medium.

The proof of exploitation is often absent. A finding will state that a parameter is vulnerable to injection. The evidence is a payload that returned an error message. The error message is treated as confirmation. Whether the injection produced data extraction, command execution, or authentication bypass is not stated. The finding asserts a class of vulnerability. The finding does not demonstrate the boundary that was crossed. That is not a pentest result. That is a hypothesis.

Why it failed

The engagement scope is written to produce findings, not to test controls. Time is allocated to surface enumeration. Time is not allocated to chaining. A tester who finds twelve low-severity issues in week one is rewarded for output. A tester who spends week two building one exploit chain across three of those issues is seen as slow. The incentive structure produces the report the buyer receives. Volume over depth is the contracted deliverable, even when no one wrote it down.

The tooling drives the content. Automated scanners produce findings in a format that maps directly to report templates. The tester’s job becomes triage and transcription. Manual testing that does not produce a scanner-shaped artifact is harder to justify in the writeup. A logic flaw that requires four steps to demonstrate takes a page to describe. A missing security header takes a paragraph and a screenshot. The report fills with the cheaper content.

The review layer does not enforce a different standard. Reports are quality-checked for formatting, grammar, and template compliance. They are not quality-checked against the question of whether each finding represents a control failure with demonstrated impact. The reviewer confirms the document is complete. The reviewer does not confirm the document is correct. The padding survives review because review is not designed to remove it.

The mechanism is substitution. The deliverable that can be measured replaces the deliverable that should be measured. Finding count can be counted. Exploit chains require judgment. The buyer asks for a number. The number is supplied. The tester optimises for what the buyer evaluates. The evaluation criteria become the work product. The work product becomes the engagement. The engagement is no longer a test of controls. It is the production of an artifact that survives procurement review.

Severity inflation runs on the same substitution. CVSS produces a number. The number looks objective. The number is generated without environmental context. Environmental scoring exists in the specification. Environmental scoring is rarely applied because it requires the tester to understand the target environment. Understanding requires time. Time is not paid for separately. The base score is shipped as the final score. The number survives because it is a number, not because it is correct. The reader treats two Mediums as equivalent because the rating system gave them the same label. The rating system was not calibrated to do that work.

Evidence handling fails by the same mechanism. A screenshot of an error response is treated as proof because it is artifact-shaped. An exploit chain that requires session state, custom tooling, and ordered steps is harder to package as a single artifact. The report rewards what can be pasted into a template. The work that cannot be pasted is the work that demonstrates impact. The deliverable format selects against the finding type that matters. The findings that survive the format are the findings that did the least to test the boundary.

Parallel pattern

The same substitution runs through compliance audit output. The auditor produces a control matrix. Each control is marked tested or not tested. The test was a question to a system owner and a screenshot of a configuration page. The control was not exercised against adversarial input. The matrix is complete. The control state is not confirmed. The buyer accepts the matrix because the matrix is the contracted artifact. The certificate is issued against the matrix, not against the control behaviour. The attacker tests the control behaviour. The certificate does not appear in that exchange.

It runs through SOC alert volume reporting. The detection team reports alerts processed per week. The number rises. The number is presented as throughput. Whether the closed alerts were correctly closed, whether the alerts represented real attacker behaviour, whether the missed alerts were missed at all, is not measured by the same instrument. The metric that is reported is the metric that is optimised. The metric that is optimised is not the metric that defends the environment. The team that ships the dashboard is rated on the dashboard. The dashboard is not rated on the environment.

It runs through bug bounty triage. A program reports submissions received and bounties paid. The submissions skew toward low-effort findings because low-effort findings clear triage faster. High-impact research that requires program engagement and chain validation is filtered by the same intake process that produces the headline numbers. The program looks active. The program is processing volume. The volume is not correlated to the attacker capability the program was funded to surface. The mechanism is identical to the pentest report. A measurable proxy replaces the work the proxy was supposed to represent.

A pentest report is a record of control failures with demonstrated impact. If it does not contain that, it is not a pentest report. It is a vulnerability scan with a cover page. The two cost different amounts. Buyers should know which one they bought. Vendors should know which one they shipped. The market currently rewards confusing the two. That is a procurement problem and a reporting problem, and it is fixed in the report before it is fixed anywhere else.

The finding count is not the product. The exploit chain is the product. A report with three chains that cross identity, privilege, and data boundaries describes more attacker capability than a report with forty findings that describe surface state. The buyer who cannot read the difference is paying for paper. The vendor who prefers selling paper will keep selling it until the buyer reads the report differently. The signal that the buyer has changed is that the executive summary leads with chains, not with counts. Until that change appears in procurement, the format will produce the same output.

What must be true. Every finding traces to an access outcome, a privilege transition, or a data movement. Every severity rating reflects the environment, not the scanner default. Every claim of exploitation includes the artifact that demonstrates the boundary crossed. Findings that do not meet that standard are appendix entries, not report entries. The cover page is reserved for the work that changes the security posture of the target. A report that cannot fill its cover page under that rule is reporting that the engagement did not produce a result. That is a valid outcome. Padding it into something else is not.

Stop counting findings

Opening position

What actually failed

Why it failed

Parallel pattern

Keep Reading

The terminal in the basement was never the job

A project name is not a threat model

GTFOBins catalogues privilege misconfiguration

Stay in the loop