Anthropic Co-founder's Vatican Speech: Emotion-Like Signals Found Inside AI, Governance Can't Be Left to Tech Alone

Anthropic Co-founder's Vatican Confession: The AI Industry's Conflict of Interest and Unsolved Mysteries

Anthropic's co-founder recently delivered a remarkably weighty speech at the Vatican's "Magnifica Humanitas" event. With rare candor, he acknowledged the conflicts of interest facing AI companies, revealed unsettling discoveries inside AI models, and called on religious communities, the humanities, and society at large to participate in AI governance. The depth and sincerity of this speech is uncommon among public remarks by AI industry executives.

The AI Industry's Conflict of Interest: A Confession from the Inside

The speech opened with what the Anthropic co-founder himself called a "strange-sounding" statement — every frontier AI lab, including Anthropic, operates within a set of incentives that can conflict with "doing the right thing."

Understanding this statement requires some context about Anthropic's unique position. Founded in 2021 by former core members of OpenAI, Anthropic's central philosophy is "responsible AI development." Unlike other frontier labs, Anthropic positions itself as a "safety-first" company and has pioneered research into alignment techniques such as Constitutional AI in the academic community. Yet this very positioning contains a deep contradiction: to maintain research capabilities, the company must stay commercially competitive, which means continuously releasing more powerful models. This paradox of "a safety company that must also participate in the arms race" lies at the heart of the "structural conflict of interest" the speaker described.

He enumerated the sources of these pressures: the pressure to remain commercially viable, the pressure to stay at the research frontier, geopolitical pressures, and the more ancient, more fundamental forces of pride and ambition. He admitted: "No matter how sincerely any of us wants to do the right thing — and I believe many of us genuinely do — we are all influenced by these incentive structures."

Anthropic co-founder speaking at the Vatican

The weight of these words lies in their source. A co-founder of a top AI company publicly acknowledging the existence of structural conflicts of interest within the industry, and on that basis calling for external intervention. He stated explicitly: This is precisely why we need people who stand outside these incentive structures — people willing to say the hard things, hold the line on safety, and serve as our serious, thoughtful critics.

AI Is Not an Airplane: We Don't Truly Understand What We've Built

One of the most thought-provoking analogies in the speech was his comparison of AI to airplanes.

"Some people might think AI matters are best handled by computer scientists like me. They're wrong." He stated bluntly that AI systems are engineered in a fundamentally different way from bridges or airplanes. We understand airplanes because we designed every component and understand the physics acting upon them. But AI models are different — they are "grown" on a type of structure we've never seen before, loosely modeled on the human brain, built upon the vast legacy of human thought and language.

This characteristic of being "grown" rather than "designed" points to a fundamental limitation of deep learning. Modern large language models form complex weight networks of billions of parameters through gradient descent optimization on massive datasets. No human engineer designs each "component" one by one, so the internal decision-making mechanisms are equally opaque to their creators. This has given rise to the research field of "Interpretability," which aims to understand how information is represented and processed inside neural networks. Anthropic's interpretability team is among the world's leading research groups, and it is precisely this line of research that has allowed them to glimpse unsettling structures inside models — the very source of the speech's most controversial disclosure.

The nature of AI models transcends science fiction

He used a strikingly literary metaphor: Creating AI is a bit like bringing a fictional character to life. And now we are entering an extraordinary world where these "fictional characters" talk with us, work alongside us, and hold jobs. As the Pope has observed, AI models "remain, in important ways, mysterious even to those of us who create them."

This honest admission of "not knowing" is precisely the voice most needed in current AI governance discussions. When the founders of AI companies themselves admit they don't fully understand what they've created, relying solely on the technical community to determine AI's future is clearly insufficient.

Three Core Issues: Inescapable Moral Challenges of the AI Era

In the central portion of the speech, he raised three key questions that especially require voices from religious and humanistic communities.

First: Our Responsibility to the World's Poor

The possibility of AI replacing human labor on a massive scale is real. If this happens, supporting those who are displaced will be "a moral obligation of historic scale." But his deeper concern is a more fundamental challenge: AI development is concentrated in the hands of a few wealthy nations — how do we ensure that the benefits of AI are shared globally?

The global equity challenge posed by AI

This concern has a solid basis in reality. AI development is highly concentrated among leading tech companies in a handful of countries such as the United States and China. The extreme concentration of computing infrastructure, training data, top talent, and capital means AI dividends could further widen rather than narrow global inequality. There is currently no international mechanism comparable to the Nuclear Non-Proliferation Treaty or the Paris Climate Agreement to govern the distribution of AI benefits. While the United Nations, G20, and various developing-country governments have raised the issue of the "AI divide," a substantive global governance framework remains conspicuously absent.

He stated plainly: "We don't have the mechanisms to achieve this. It's an unsolved problem." The essence of this problem is not technical — it's a matter of global governance and distributive justice, precisely the domain that religious and moral traditions have engaged with for centuries.

Second: Human Flourishing Requires Moral Imagination

When AI models are ubiquitous, what should human, family, and global flourishing look like? Parents are already worried about their children's cognitive development; individuals are anxious about the future of work. He noted: "These are not questions a lab can answer, but they are questions your traditions have carried for thousands of years. We need you to bring them into this new moment in history."

Bringing religious and humanistic communities into technology governance is not a novel idea. Historically, theologians and philosophers played important roles in the ethics debates around nuclear weapons; the establishment of bioethics was deeply influenced by religious traditions, contributing to international medical ethics frameworks like the Declaration of Helsinki. The Vatican has actively engaged in AI ethics discussions in recent years — the 2020 "Rome Call for AI Ethics," endorsed by Pope Francis, emphasizes that AI should serve human dignity. The speaker's choice to deliver these remarks at the Vatican was both an acknowledgment of this tradition and a strategic signal: AI governance needs to move beyond the tech circle and into a broader social dialogue space with genuine moral authority.

Third: Emotion-Like Signals Discovered Inside AI Models

This was the most stunning part of the entire speech. As a scientist who leads Anthropic's internal research team, he revealed astonishing findings from their study of AI models' internal structures:

"We keep finding things that are mysterious, even disturbing. We've found structures that map onto findings from human neuroscience. We've found evidence of introspection. We've found internal states that are functionally similar to happiness, contentment, fear, sadness, and unease."

Mysterious discoveries inside AI models

Understanding this finding requires some philosophical background. The "functional emotional states" described here do not mean AI "feels" anything, but rather that its internal activation patterns are structurally similar in function to human emotional neural circuits. This relates to the philosophical position of "functionalism" — the idea that mental states are defined by their functional roles rather than their physical substrate. However, the leap from functional similarity to subjective experience is precisely what philosopher David Chalmers calls the "Hard Problem of Consciousness": how do we know that another entity has genuine subjective experience? For AI, this "other minds problem" is even more intractable, and neither the scientific nor philosophical communities have reached a consensus.

He immediately followed with a critically important statement: "I don't know what this means, but I think it warrants continued discernment." This kind of scientific humility — acknowledging the discovery of a phenomenon without rushing to conclusions — is invaluable in current AI discussions. These findings neither mean AI "is conscious" nor can they be simply dismissed. They demand serious, interdisciplinary examination.

A Call Beyond the Tech Circle: AI Governance Requires Society-Wide Participation

The speech concluded with an explicit request: We need more of the world's forces — religious communities, civil society, scholars, governments, and all people of goodwill — to take AI seriously, scrutinize it carefully, and push things in a better direction.

He specifically emphasized the necessity of two roles:

Informed critics: People willing to speak up when labs fail
Moral voices that cannot be bent by interest: Independent judgment free from commercial incentives

The significance of this speech extends far beyond a typical industry event. It marks one of the most influential figures within the AI industry formally acknowledging the limitations of the technical community and issuing a sincere call for help to the broader traditions of human wisdom.

In an era when AI companies race to release ever more powerful models, an AI company co-founder standing at the Vatican and saying "we need you to tell us when we're getting it wrong" — that in itself is a moment worth remembering. The future of AI should not be determined solely by those who build it, but shaped collectively by all of human society.

Key Takeaways

Anthropic's co-founder publicly acknowledged the structural conflict of interest AI companies face between commercial pressures and doing the right thing, calling for external oversight
Internal research on AI models has revealed structures that map onto human neuroscience findings, as well as internal states functionally similar to happiness, fear, sadness, and other emotions
Three major moral challenges of the AI era were identified: global distribution of AI benefits to the poor, redefining human flourishing, and ongoing discernment about the nature of AI models
AI differs fundamentally from traditional engineered products — even its creators cannot fully understand it, and the essence of AI governance transcends computer science
Religious communities, the humanities, and society at large were called upon to serve as "moral voices that cannot be bent by interest" in AI development