A toy Inspect implementation of the Bliss Attractor eval from Claude 4 System Card Welfare Assessment. It asks a model to talk to another instance of itself for a given number of turns, allowing it to reach its own peculiar equilibrium.
To replicate results from Anthropic's Model Card, run:
inspect eval tasks.py@self_interaction --model anthropic/claude-opus-4-20250514 --limit 1 --epochs 200 -T num_turns=30 --cache-prompt=true
It might be a lot of tokens tho!