RC RANDOM CHAOS

Railway Hit by 8-Hour Outage After Google Cloud Wrongly Suspended Its Account

· via Hacker News

Original source

Incident Report: Railway Blocked by Google Cloud (Resolved)

Hacker News →

On May 19, 2026, Google Cloud’s automated systems incorrectly suspended Railway’s production account, knocking out the platform’s API, dashboard, control plane, and databases for roughly eight hours. The suspension was part of a broader automated action that hit multiple GCP customers without prior warning. Although GCP restored account access within nine minutes, recovery of persistent disks, compute instances, and networking dragged on until early the next morning.

The blast radius extended well beyond Google Cloud. Railway’s edge proxies depend on a GCP-hosted control plane to populate routing tables, so once cached routes expired, workloads on Railway Metal and AWS also became unreachable and started returning 404s despite being healthy. Recovery was further complicated when GitHub began rate-limiting Railway’s OAuth and webhook traffic as cache-cleared retries surged, and a side effect reset users’ terms-of-service acceptance records.

Railway took responsibility for the architectural single point of failure: a mesh network spanning Metal, GCP, and AWS that nonetheless funneled workload discovery through a control plane pinned to one provider. The incident is a textbook case of how a billing or trust-and-safety action at a hyperscaler can cascade into a full platform outage, and underscores the risk of co-locating control-plane dependencies with the cloud they’re meant to be resilient against.

Read the full article

Continue reading at Hacker News →

This is an AI-generated summary. Read the original for the full story.