In 2017, a routine AWS S3 maintenance action inadvertently removed more servers than intended (read Catchpoint's S3 outage write-up), taking significant portions of the internet with it. Dashboards went dark, status pages became more cautious, and everyone received a refresher on how concentration can amplify small errors into significant impacts. The lesson wasn't "cloud is bad." It was: when the center stumbles, the edges feel it.
In 2025, Anthropic published a clean, technical postmortem on three overlapping infrastructure bugs that degraded Claude's outputs between early August and mid-September. It's the kind of disclosure worth rewarding: specific, time-boxed, concrete, with fixes. In a market trending toward centralized LLM access, this kind of transparency isn't PR; it's a public good.
The TL;DR from their write-up:
- A routing bug sent some Sonnet 4 traffic to servers configured for an upcoming 1M-token context window, and a later load-balancing tweak amplified the blast radius. Affected users were disproportionately hit because routing was "sticky," so follow-ups kept landing on the wrong pool. Fixes rolled out by mid-September.
- A TPU misconfiguration corrupted outputs. Some users got random Thai/Chinese characters in the middle of English answers and code syntax faceplants. The changes were rolled back on Sept. 2 with new detection tests added.
- A latent XLA:TPU compiler issue around approximate top-k surfaced after a precision patch; they switched to exact top-k (accepting minor efficiency cost) and coordinated with the compiler team on a proper fix.
Anthropic's public status page links the incident to the postmortem. Precisely what we should want from a core infra vendor: say what broke, who it impacted, what changed, and what users should be aware of downstream.
It's a solid disclosure: specific, time-boxed, and explicit about what changed. In a market where many products share the same model supply, that level of detail is not just PR, it's an operational signal.
This matters because the foundation-model market differs significantly from databases and clouds. It's narrower with a handful of suppliers and layered in ways that make dependencies less visible. A configuration or serving quirk in one layer can ripple into dozens of "AI-native" products built on top. The outcomes vary, yet the root cause is shared.
The Uncomfortable Bit: Same Model ≠ Same Outcomes
LLMs are not deterministic. Settings, serving routes, compiler flags, precision choices, and scheduler noise all add variance. Stack a service-level issue on top of a sticky routing pool or a sampler change, and the effect isn't simply "up" or "down."
It's uneven:
- One customer's workflow fails quietly.
- Another may experience accuracy drift.
- A third sees no noticeable change at all.
Same "model," yet different experiences. That's not the clean failure mode we're used to in distributed systems. It's closer to rolling dice with weighted odds you didn't agree to. And if you've built your business on top of a single provider's dice, your luck is their uptime.
The practical takeaway is that output quality should be monitored as a distribution, not a single number.
Model Monoculture = Systemic Risk
This is structural, not moral:
- Concentration: a small set of providers supplies a general-purpose capability.
- Opacity: downstream apps often can't see all the serving details that matter.
- Correlation: when a provider changes routing, compilers, or precision, many tenants move together.
In finance, we label this systemic risk. In software, it often shows up as "someone else's outage." When the same service underpins a large slice of the market, the distinction matters.
Why Disclosure and Monitoring Are the Pressure Valves
Transparency reduces the blast radius in two ways:
- Faster isolation: precise timelines and bug classes (routing, TPU config, approximate vs. exact top-k) give downstream teams a starting point for their own triage.
- Better testing: enumerating classes of failure helps buyers map those classes to their own evaluations and monitors.
In that sense, detailed postmortems function like stress-test summaries. They don't eliminate concentration, but they make it easier to reason about it.
Disclosure as Antitrust Valve
This is why disclosure matters. Not the vague "we fixed some stability issues" kind, but the kind Anthropic published, which was specific, technical, and blame-accepting. It's the software equivalent of a financial stress test: a public check that a supplier at the center of the ecosystem is at least watching its own risk profile.
Because let's be honest: the model layer is already too big to fail. A handful of players control the keys. If one locks up, it's not just their customers who eat the cost. It's every downstream app, investor, and end-user in the supply chain.
Typically, antitrust cases are built on pricing power, but here the risk is different: concentration of failure modes in a market that appears diverse. The appearance of choice (OpenAI vs. Anthropic vs. Google) masks the reality that most services are one API call away from sharing the same fate.
Watch Closely, Choose Carefully
If you're building on major LLM providers, keep the following in mind:
- Diversify: multi-model routing isn't just cost optimization, it's risk hedging.
- Demand disclosures: treat postmortems as a form of insurance policy. If they stop coming, worry.
- Design for drift: assume the model you called yesterday is not the same one you'll call tomorrow.
The irony of a monopoly is that it feels safer because everyone else is there too. But in a non-deterministic system, safety isn't about who you trust. It's about how much randomness you're willing to absorb from a single source.
Conclusion
If you're relying on one model provider, in addition to active monitoring, apply the following:
- Assume variance and engineer guardrails.
- Reward transparency and treat postmortems as part of the product.
- Design for plurality, even if you are only using one provider.
Anthropic's write-up is a good example of how to talk about real incidents in a shared supply market. The broader point is simple: reliance on "the one" is sometimes the right trade, but it is a trade.
In a non-deterministic system, safety isn't about trusting a name. It's about how much randomness you are prepared to absorb.
When the dice are loaded upstream, you can't code your way out downstream. Instead, you should notice faster, route smarter, and keep the blast radius small.