Skip to content

tomekkorbak/bliss-attractors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Bliss Attractors ๐ŸŒ€

A toy Inspect implementation of the Bliss Attractor eval from Claude 4 System Card Welfare Assessment. It asks a model to talk to another instance of itself for a given number of turns, allowing it to reach its own peculiar equilibrium.

To replicate results from Anthropic's Model Card, run:

inspect eval tasks.py@self_interaction --model anthropic/claude-opus-4-20250514 --limit 1 --epochs 200 -T num_turns=30 --cache-prompt=true

It might be a lot of tokens tho!

About

A toy Inspect implementation of the Bliss Attractor eval from Claude 4 System Card Welfare Assessment

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages