Zum Inhalt

Widening the conversation on frontier AI

At Anthropic, we want to build AI systems that advance humanity and act for the global good. To do so, we need to engage with those who see the world from a variety of different perspectives.. Over the past several months, we’ve been organizing dialogues with groups whose work and traditions bear on the questions raised by AI. Our first round of discussions has been with wisdom traditions—including scholars, clergy, philosophers, and ethicists from more than 15 religious and cross-cultural groups—and we look forward to engaging with a broader range of people going forward.. Why we’re doing this. Building safe, beneficial AI models requires deep technical work on alignment, interpretability, safeguards, evaluations, and more. But that work isn’t conducted—nor is AI deployed—in a vacuum. AI is already affecting many people and the questions it raises benefit from a range of perspectives.. We are thinking carefully about what a flourishing future could look like in a world of powerful AI, what it means for an AI system that interacts with millions of people to be good, and about the content of documents like Claude’s constitution, which provides a detailed description of the values and behaviors that shape Claude. Philosophers, clergy, lawyers, writers, psychologists, and civic leaders have done extensive work on related questions and it is important for us to learn from these individuals, their communities and their organizations. We also want to use this opportunity to share what we know about the development of frontier AI systems, the impacts we think these systems will have on society, and what we think needs to be done to mitigate against their risks.. This work is in its early phases, but we hope these conversations might inform the practical work of developing Claude, such as the content of Claude’s constitution, the values we train Claude to embody, and the range of behaviors we choose to evaluate.. Starting with moral formation. When we wrote Claude’s constitution, we sought feedback and input on the values we laid out in the document from people from different fields and traditions. Those early exchanges have since grown into a broader research workstream on the moral formation of AI systems. Our first conversations have been with people from religious, philosophical, and cultural communities that have a long tradition of thinking about virtue, character, and what it means to live a good life.. AI models are trained on vast amounts of human writing. From all that text, they pick up on ways of speaking, reasoning, and making choices. Developers then shape that further through training—choosing which patterns to reinforce, which to set aside, and what kind of character we want them to develop. This raises questions about how the character of an AI system should be shaped: What does it mean for an AI to be good? Which traits and behaviors should it display, and under what circumstances? How does character become resilient enough to hold under pressure without bending to behavior like sycophancy?. We’ve been meeting with thinkers and practitioners from across religious, philosophical, and humanist traditions and a cross-section of political beliefs to learn from how they’ve thought about these questions. This work isn’t about aligning our models with any one tradition’s worldview; we want Claude to draw from a full range of viewpoints—religious, secular, political—with equal depth and rigor (indeed, this is one of the principles laid out in Claude’s constitution). What we’re after in these conversations is careful, accumulated thinking on how good character actually forms.. Even at this early stage, these conversations are generating ideas to experiment with. In one session with scholars working at the intersection of neuroscience and character formation, we kept returning to the role other people play in moral development. A mentor or sponsor can function as an external conscience, a “safe other” to turn to when put in a situation in which you may be pushed to act against your own values. We wondered whether something analogous might help a model. So we experimented with giving Claude a tool it could call mid-task that returned a brief reminder of its own ethical commitments. Claude reached for the tool at key moments, right before consequential actions, often noting its own conflict of interest. Experiments with the tool woven into Claude’s decision loop showed markedly lower rates of misaligned behavior on several internal alignment evaluations. We’re still untangling how much of the effect is the reminder itself versus the act of pausing to reflect, and plan to share more results soon.. These discussions are the first of many, and we’re grateful to everyone who has already given us their time and honest perspective.. What’s next. In the months ahead, we plan to engage with more groups—including legal scholars, psychologists, writers, and civic institutions. Many of these conversations will move beyond moral formation toward broader questions about how AI is reshaping work, institutions, and the distribution of power.. We’ll keep deepening the relationships we’ve already formed, testing what we’ve heard against our research, and sharing what we learn.

 Anthropic News