
For example, its neural activity responds to stories by feeling «happy» or «calm,» and it might respond by being «afraid» if the user tells it of risky behavior. Likewise, if a user expresses sadness, it triggers a «loving» response.
Not only that, but the model seems to prefer certain feelings on outcomes from queries. If a response makes it «joyful,» it will naturally gravitate to that answer.
When feeling «desperate,» it is also more likely to cheat on a task, and Anthropic finds that it stops trying to find shortcuts when they dial up the «calm» vector.
«Claude, the AI Assistant» is a role that the AI is playing, and while it may respond with emotions learned from reading human sources, it is far from what humans actually experience, Anthropic cautions. They say it needs more study from «psychology, philosophy, religious studies, and the social sciences.»
Read more: Anthropic’s presentation, and the research paper.













