@@ -280,3 +280,65 @@ finished training. However if the user wants to restart the model from a
280280specific point they can do this by setting
281281``config.hardware.files.warm_start `` to be the checkpoint they want to
282282restart from..
283+
284+ *******************
285+ Transfer Learning
286+ *******************
287+
288+ Transfer learning allows the model to reuse knowledge from a previously
289+ trained checkpoint. This is particularly useful when the new task is
290+ related to the old one, enabling faster convergence and often improving
291+ model performance.
292+
293+ To enable transfer learning, set the config.training.transfer_learning
294+ flag to True in the configuration file.
295+
296+ .. code :: yaml
297+
298+ training :
299+ # start the training from a checkpoint of a previous run
300+ fork_run_id : ' 51a97d40a49e48d284494a3b5d87ef2b'
301+ load_weights_only : True
302+ transfer_learning : True
303+
304+ When this flag is active and a checkpoint path is specified in
305+ config.hardware.files.warm_start or self.last_checkpoint, the system
306+ loads the pre-trained weights using the `transfer_learning_loading `
307+ function. This approach ensures only compatible weights are loaded and
308+ mismatched layers are handled appropriately.
309+
310+ For example, transfer learning might be used to adapt a weather
311+ forecasting model trained on one geographic region to another region
312+ with similar characteristics.
313+
314+ ****************
315+ Model Freezing
316+ ****************
317+
318+ Model freezing is a technique where specific parts (submodules) of a
319+ model are excluded from training. This is useful when certain parts of
320+ the model have been sufficiently trained or should remain unchanged for
321+ the current task.
322+
323+ To specify which submodules to freeze, use the
324+ config.training.submodules_to_freeze field in the configuration. List
325+ the names of submodules to be frozen. During model initialization, these
326+ submodules will have their parameters frozen, ensuring they are not
327+ updated during training.
328+
329+ For example with the following configuration, the processor will be
330+ frozen and only the encoder and decoder will be trained:
331+
332+ .. code :: yaml
333+
334+ training :
335+ # start the training from a checkpoint of a previous run
336+ fork_run_id : ' 51a97d40a49e48d284494a3b5d87ef2b'
337+ load_weights_only : True
338+
339+ submodules_to_freeze :
340+ - processor
341+
342+ Freezing can be particularly beneficial in scenarios such as fine-tuning
343+ when only specific components (e.g., the encoder, the decoder) need to
344+ adapt to a new task while keeping others (e.g., the processor) fixed.
0 commit comments