Your GitHub commits were never trustworthy

A campaign tracked as Megalodon has compromised more than 55,000 GitHub repositories. That is the confirmed scope. Initial access vector, payload behaviour, persistence mechanism, and attacker attribution are not confirmed in the available facts.

The number matters more than the name. 55,000 repositories is not a targeted intrusion. It is a population-scale compromise of source-controlled assets. Each repository is a potential delivery surface for downstream consumers, CI pipelines, container builds, and dependency graphs. The blast radius is not measured in repositories. It is measured in everything that pulls from them.

Treat this as a supply-side event affecting any organisation or individual whose build pipeline, dependency manifest, or local clone touches GitHub. Whether your specific repositories are in the affected set is a question of inventory, not assumption. If you have not verified, you do not know.

The operating assumption inside most engineering organisations is that a repository on GitHub reflects the state its maintainers intended. Commits are trusted because they appear under a known author. Tags are trusted because they appear on a known release. Workflows are trusted because they live in the same tree as the code they build. None of these properties are enforced by the platform without explicit configuration. They are conventions.

The second assumption is that compromise at the repository layer is rare and noisy. Engineering teams expect to notice unauthorised commits because they expect to be watching. In practice, most repositories are not monitored at the commit, workflow, or token level. Notifications are tuned for human collaboration, not adversarial activity. A change pushed by a valid token from a valid account does not look like an attack. It looks like work.

The third assumption is identity. A GitHub account that holds write access to a repository is treated as the maintainer. The account is the boundary. If the account acts, the action is authorised. This is the model the platform enforces. It does not validate whether the human behind the account is the one performing the action. Identity, as implemented, is credential possession. That is not the same as identity.

What changed is the confirmed scale. 55,000 repositories sit inside a single named campaign. The mechanism that produced that number is not confirmed in the available facts, but the number itself rules out manual, per-target operation. A population of that size requires automation across the access path, the modification path, or both. The attacker operated at platform scale. Defenders, in most cases, do not.

What changed for downstream consumers is the trust calculation on any artifact pulled from GitHub during the window of compromise. Window start, window end, and affected repository list are not confirmed. Until those are published, the conservative position is that any dependency, action, or clone fetched recently from GitHub requires verification against a known-good reference. Lockfile hashes, signed tags, and reproducible builds are the only mechanisms that answer the question. Visual inspection of a repository page does not.

What changed for repository owners is the burden of proof. The default assumption that your repository reflects your intent no longer holds without evidence. Evidence means: reviewed commit history against expected authors, audited active tokens and their scopes, audited installed GitHub Apps and OAuth grants, audited workflow files for unexpected steps, and audited any secrets that may have been exposed to a compromised workflow run. If you have not performed those checks since the campaign was disclosed, your repository state is not confirmed.

The mechanism that produced 55,000 compromised repositories is not confirmed in the available facts. What is logically necessary from the stated scale is automation. A population that size cannot be reached by hand within any realistic operator window. The access path, the modification path, or both, were executed by code against an interface that accepted the actions as legitimate. The platform processed the requests because the requests carried the markers it requires. Whatever those markers were in this campaign, token, session, or app installation, is not confirmed.

The failure is not novel. The trust boundary on GitHub is the credential, and the credential is portable. Anything that can hold a credential can perform the actions the credential authorises. A laptop, a CI runner, a leaked log line, a browser extension, a malicious dependency executed at install time. The platform does not distinguish between the maintainer typing at a keyboard and an automated process replaying a token captured elsewhere. Once the credential is in motion, the action is authorised. The mechanism by which Megalodon obtained credentials, or other valid markers, is not confirmed. The fact that 55,000 repositories were reached is consistent with a mechanism that scales without human pacing.

Drift, in this context, is the distance between the security model engineering teams believe they operate inside and the model the platform actually enforces. Teams believe commits are authored by maintainers. The platform records the credential that pushed them. Teams believe workflows are reviewed before they run. The platform executes whatever sits in the workflow file at the moment of trigger. Teams believe tokens are scoped tightly. The platform enforces whatever scope was selected at creation, often broader than the task required, and rarely audited after the fact. Megalodon did not need to defeat the security model. It needed to operate inside the gap between belief and enforcement.

The same mechanism appears anywhere identity is reduced to a possessable token and automation is permitted to act on that identity at machine speed. The pattern is not specific to GitHub. It is specific to systems where the boundary is the credential and the credential is portable. Any platform that issues long-lived tokens, accepts those tokens from any network location, and does not continuously validate that the holder is the intended principal, exhibits the same failure mode under the same conditions. The number of affected objects scales with the number of repositories, mailboxes, buckets, or records the credential can touch.

The parallel inside most organisations sits in the CI environment itself. CI runners hold credentials with write access to source, registries, cloud accounts, and production. Those credentials are exercised thousands of times per day by automated processes. The system that decides whether a given execution is legitimate is, in most deployments, the workflow file in the repository. If the workflow file changes, the execution changes. If the credential is reachable from a process the workflow invokes, the credential is reachable by whatever code that process runs. The trust model is the same as the one Megalodon operated against. The blast radius is whatever the credential can do, multiplied by the number of runs.

The pattern also appears in any system where an installed application or integration acts on behalf of a user without per-action consent. OAuth grants, GitHub Apps, browser extensions with broad scopes, and IDE plugins with repository access all sit in the same category. The user authorised the integration once. The integration acts continuously. If the integration is compromised at its source, every account that granted it inherits the compromise. The mechanism does not require the attacker to touch the user’s credential. It requires the attacker to touch something the user already trusted. Whether Megalodon used a path of this kind is not confirmed. The pattern is independent of the specific campaign.

A repository is not a source of truth. It is a record of what some credential was permitted to write. The distinction is not academic. It determines whether the contents can be trusted without external verification. For 55,000 repositories inside the Megalodon set, the answer is that they cannot. For every repository outside the confirmed set, the answer is that it cannot be answered without inventory. The default position, that a repository reflects its maintainer’s intent, requires evidence that most organisations do not currently produce.

Identity on GitHub, as on most platforms, is credential possession. Until that changes, every credential is a potential compromise vector and every integration that holds one is a potential pivot point. Controls that are not enforced at the moment of action are not controls. Branch protection that can be disabled by the account it protects is not protection. Token scopes that are never audited after creation are not scopes. Workflow files that execute whatever they contain at trigger time are not reviewed code. These are conventions, and Megalodon, whatever its specific mechanism, demonstrates what happens when conventions meet automation.

What must now be true is narrow and concrete. Repository state must be verifiable against an external reference, not the repository itself. Credentials must be short-lived, scoped to the minimum action, and revocable without coordination. Integrations must be inventoried, and the inventory must be reviewed at a cadence that matches the rate at which integrations are added. Workflows must be treated as production code, with review and signing requirements that match the access they hold. None of this is new. The 55,000 number is the cost of treating it as optional.

Your GitHub commits were never trustworthy

Keep Reading

GitHub's scanners cleared 10,000 trojan repos

One login screen now guards your entire machine

One PIN unlocks the vault

Stay in the loop