A postmortem of three recent issues

Between August and early September, three infrastructure issues caused intermittent drops in Claude’s response quality. We have now fixed these problems and would like to explain what occurred. In early August, several users started reporting that Claude’s responses had become noticeably worse. These early reports were hard to differentiate from typical fluctuations in user feedback. By the end of August, the growing volume and persistence of these reports led us to launch an investigation, which ultimately revealed three distinct infrastructure bugs. To be perfectly clear: we never degrade model quality based on demand, time of day, or server load. The issues reported by our users stemmed solely from infrastructure bugs. We understand that users expect consistent quality from Claude, and we uphold an exceptionally high standard to ensure that infrastructure changes do not impact model outputs. In these recent cases, we fell short of that standard. The following postmortem details what went wrong, why detection and resolution took longer than expected, and the changes we’re implementing to prevent similar incidents in the future. We don’t usually share this level of technical detail about our infrastructure, but the scale and complexity of these issues warranted a more thorough explanation.

How we serve Claude at scale. We make Claude available to millions of users through our own API, Amazon Bedrock, and Google Cloud’s Vertex AI. We run Claude on a variety of hardware platforms, including AWS Trainium, NVIDIA GPUs, and Google TPUs.

Anthropic Engineering

A postmortem of three recent issues

Lumi AI News

Rechtliches

Themenbereiche