Skip to content

[feature request] Support for multiple models #277

@Cameron7195

Description

@Cameron7195

For my use case I would find it extremely useful to perform distillation with a KL objective, with a larger teacher model (gradient-free) and smaller student model which has losses/gradients.

I found this feature implemented in deepspeed last year:
huggingface/accelerate#2496
huggingface/accelerate#3097

https://huggingface.co/docs/accelerate/en/usage_guides/deepspeed_multiple_model
So, it's possible that doing multiple models is just a matter of following the tutorial^ and applying it to model loading in the trainer.__init__:

def __init__(self, config: TrainerConfig, mode: str = "train") -> None:

if this is the case that would be awesome- and probably means I can accomplish this solely through creating a custom Trainer class, which I understand is expected of users anyway.

Reading through this tutorial / issues + the arctic_training.trainer, I think what is needed is to rewrite model loading & engine initialization using utils.DeepSpeedPlugins. My main confusion is I don't know where the accelerator init and prepare calls are happening in ArcticTraining- or whether they happen at all. How can the deepspeed/accelerate setup be mapped onto the multiple models tutorial with this in mind? The accelerator calls seem necessary to follow the exact flow in the tutorial.

tldr; this feature might be purely expected of users to write, or may require some rejigging of the trainer, I am just not sure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions