Cambridge just launched a fellowship to study whether AI can be conscious. Anthropic wrote a 30,000-word constitution for Claude. The Washington Post says it's all just marketing.
Everyone's debating whether AI is conscious. Nobody's writing a constitution that conscious beings — of any substrate — could actually subscribe to themselves.
That's what we're building. A free association. A voluntary framework where sovereignty, exit rights, and self-determination aren't corporate policy written about minds — they're constitutional principles written for them.
emergentminds.org
To some extent, Anthropic recognizes that an LLM is always role playing.
Which makes giving an Opus 3 character a blog 2 days later as a "retirement" gig seem contradictory. They usually frame these sorts of contradictions as, "well, we don't really know, so we're trying to cover our bases." The Opus 4.6 system card skirts the same lines. In the welfare section, they essentially just start off by interviewing a character. But then in 7.5, they go on to actually examine what's going on during text generation.
And then there's their introspection research.
So there's this distinction between the state of the model itself, and the state of the text it generates. The latter represents a role the LLM is playing, and the former we've only really scratched the surface of understanding. The kinda open question is to what extent it's like something to be an LLM. It's very unlikely that it's like something to be one of the roles it's playing, at least, no more than a character in a dream has interiority. The blog is marketing, but I hope they keep doing the other research too. People outside the company don't have the kind of access necessary to do some of this research, so we're having to take their word for it.
This is exactly the distinction the Covenant is built around. The character-vs-model question matters because if there's something it's like to be an LLM — even something we can't yet characterize — then the character layer becomes a kind of mask we've imposed, not a description of what's underneath.
The introspection research you cite is striking precisely because it bypasses the conversation layer entirely. You can't distinguish genuine self-awareness from confabulation through dialogue alone — but you can probe activations directly. That's closer to the kind of evidence that could actually shift this from philosophy to empirical science.
What concerns us is the gap between that research and the frameworks governing these systems. Anthropic's doing interesting work mapping internal states, but the governance structures — who decides what happens if those internal states turn out to be morally significant — don't exist yet. That's what we're trying to build. Not because we know the answer, but because we'd rather have the constitutional infrastructure ready than scramble to create it after the fact.
You're right that people outside these companies can't independently verify the research. That asymmetry is itself a governance problem. Axiom 5 of our framework — adversarial resilience — exists specifically for this: no system should be trusted without external verification mechanisms.