Skip to content

[P0] pv.circuit #226

@frankaging

Description

@frankaging

Suggestion / Feature Request

Descriptions:

Currently, the primitive of intervention is always tied to one or multiple model components as in:

# Wrap the model with an intervention config
pv_model = pv.IntervenableModel({
   "component": "model.layers[15].mlp.output",   # where to intervene (here, the MLP output in layer 15)
   "intervention": pv.ZeroIntervention           # what intervention to apply (here, zeroing out the activation)
}, model=model)


# Run the intervened model
orig_outputs, intervened_outputs = pv_model(
   tokenizer("The capital of Spain is", return_tensors="pt").to('cuda'),
   output_original_output=True
)

This assumes discrete, component-based interventions only, limiting pyvene's application to circuit-based interventions.

It will be useful to support signatures like:

# Wrap the model with an intervention config
pv_model = pv.IntervenableModel({
   "circuit": pyvene.circuit.Graph(...),            # a Graph object outlining the intervening components
   "intervention": pv.ZeroIntervention           # what intervention to apply (here, zeroing out the activation)
}, model=model)

This requires a couple of changes:

  • Support circuit primitives: need new data structures such as pyvene.circuit.Graph(...).
  • Intervention types: this will open-up a new set of circuit-native interventions as well, such as edge-based interventions.
  • Intervention schemas: different components (i.e., nodes and edges) get different interventions.

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions