[feature request] Support for multiple models

For my use case I would find it extremely useful to perform distillation with a KL objective, with a larger teacher model (gradient-free) and smaller student model which has losses/gradients.

I found this feature implemented in deepspeed last year:
https://github.com/huggingface/accelerate/issues/2496
https://github.com/huggingface/accelerate/pull/3097

https://huggingface.co/docs/accelerate/en/usage_guides/deepspeed_multiple_model
So, it's possible that doing multiple models is just a matter of following the tutorial^ and applying it to model loading in the ```trainer.__init__```:
https://github.com/snowflakedb/ArcticTraining/blob/2544f03998c45a2153ac619d1b2e9857d12e4b24/arctic_training/trainer/trainer.py#L148
if this is the case that would be awesome- and probably means I can accomplish this solely through creating a custom Trainer class, which I understand is expected of users anyway.

Reading through this tutorial / issues + the arctic_training.trainer, I think what is needed is to rewrite model loading & engine initialization using ```utils.DeepSpeedPlugins```. My main confusion is I don't know where the accelerator init and prepare calls are happening in ArcticTraining- or whether they happen at all. How can the deepspeed/accelerate setup be mapped onto the multiple models tutorial with this in mind? The accelerator calls seem necessary to follow the exact flow in the tutorial.

tldr; this feature might be purely expected of users to write, or may require some rejigging of the trainer, I am just not sure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature request] Support for multiple models #277

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature request] Support for multiple models #277

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions