Flashback-Learning (FL)

Flashbacks to harmonize stability and plasticity in continual learning

Introduction

The official repository of "Flashbacks to harmonize stability and plasticity in continual learning" published in the Neural Networks, 2025.

Official paper link [NN]: (https://doi.org/10.1016/j.neunet.2025.107616)

Read the paper on [arXiv]: (https://arxiv.org/abs/2506.00477)

Figure: Flashback Learning overview. At task t, Phase 1 updates the old model on new data to obtain primary model f(.; θₚ). Then, it extracts new knowledge and stores it in PKB. Phase 2 flashbacks to the old model f(.; θ_t−1^*), regularized bidirectionally by stable and plastic knowledge, yielding f(.; θ_t^*).

We introduce Flashback Learning (FL), a novel method designed to harmonize the stability and plasticity of models in Continual Learning (CL). Unlike prior approaches that primarily focus on regularizing model updates to preserve old information while learning new concepts, FL explicitly balances this trade-off through a bidirectional form of regularization. This approach effectively guides the model to swiftly incorporate new knowledge while actively retaining its old knowledge.

FL operates through a two-phase training process and can be seamlessly integrated into various CL methods, including replay, parameter regularization, distillation, and dynamic architecture techniques. In designing FL, we use two distinct knowledge bases: one to enhance plasticity and another to improve stability. FL ensures a more balanced model by utilizing both knowledge bases to regularize model updates.

Theoretically, we analyze how the FL mechanism enhances the stability–plasticity balance. Empirically, FL demonstrates tangible improvements over baseline methods within the same training budget. By integrating FL into at least one representative baseline from each CL category, we observed an average accuracy improvement of up to 4.91% in Class-Incremental and 3.51% in Task-Incremental settings on standard image classification benchmarks. Additionally, measurements of the stability-to-plasticity ratio confirm that FL effectively enhances this balance. FL also outperforms state-of-the-art CL methods on more challenging datasets like ImageNet.

Setting Environment

To reproduce the experiments, you need to install the required Python packages. We recommend creating a virtual environment:

conda create -n flashback-env python=3.9
conda activate flashback-env
pip install -r requirements.txt

Common Benchmarks

We have used the following common benchmarks in this project:

Split-CIFAR10: A standard benchmark where CIFAR-10 is divided into multiple tasks, typically with 2 classes per task.
Split-CIFAR100: A more challenging version using CIFAR-100, often divided into 10 tasks with 10 classes each.
Split-Tiny-ImageNet: A subset of the ImageNet dataset, split into multiple tasks. It contains 200 classes of tiny resolution images, making it suitable for scalable continual learning evaluation.

Data Download and Directory

Datasets will be automatically downloaded into the ./data directory in the root of this project.

Note 1: You can change the data path in the base_path_dataset() function located in utils/conf.py.

Note 2: The ./data folder will be created automatically if it does not exist.

Note 3: For cleanliness and to avoid large file tracking, the data folder should not be tracked by Git.

Customizing Benchmarks

You can customize the structure of each benchmark by modifying the constants used in the corresponding dataset class under the ./datasets/ directory. These constants are:

N_CLASSES_TASK_ZERO: Specifies the number of classes in the first (base) task.
N_CLASSES_PER_TASK: Controls how many classes are introduced per incremental task.
N_TASKS: Determines the total number of tasks to create from the dataset.

For example, in SequentialCIFAR100 defined in ./datasets/seq_cifar100.py, the following configuration:

N_CLASSES_TASK_ZERO = 10
N_CLASSES_PER_TASK = 10
N_TASKS = 10

defines a benchmark with 10 tasks, each containing 10 classes (standard Split-CIFAR100).

To customize it to a CIFAR-100-B50-10 setup (50 base classes, 10 classes/task afterward, 6 total tasks):

N_CLASSES_TASK_ZERO = 50
N_CLASSES_PER_TASK = 10
N_TASKS = 6

Continual Learning Baselines with Flashback Learning

We develop and evaluate Flashback Learning across four major categories of CL methods:

Knowledge distillation methods
Replay memory methods
Parameter regularization methods
Architecture expansion methods

Our experiments demonstrate that FL consistently yields tangible improvements across all these categories, confirming its broad applicability and effectiveness as a plug-in module for continual learning. We show how FL is integrated into eah CL category here:

Knowledge Distillation Methods

In distillation-based continual learning, a copy of the old model is retained. When a new task begins, the model is updated on new data while a distillation loss encourages it to maintain similarity to the old model’s representations, preserving stability.

When Flashback Learning is integrated into this setup:

Phase 1: A primary model is trained on the new task.
Phase 2: The model is reinitialized to the old model and trained using a bidirectional distillation loss—from both the old and primary models—guiding the representation toward a balance between past and new knowledge, improving stability and plasticity.

We integrate Flashback Learning into two representative distillation-based continual learning methods:

Learning a Unified Classifier Incrementally via Rebalancing (LUCIR) is known for applying distillation constraints at the feature embedding level (before the final logits) to preserve learned representations. Although it allows replay of a selected exemplar set, its core mechanism relies on distillation LUCIR – CVPR 2019. The FL-integrated version of LUCIR is implemented in ./models/fl-lucir.py.
Learning without Forgetting (LwF.MC) uses distillation at the logit level, transferring knowledge from the old model to the current one without replay . We specifically use LwF.MC, a multi-class variant adapted from iCaRL iCaRL – CVPR 2017. The FL-integrated version of LwF.MC is implemented in ./models/fl-lwf_mc.py.

Memory Replay Methods

In memory replay methods, a memory with limited capacity of previous tasks' samples with their corresponding feature embeddings or logits is selected and brought to new task. When new task begins, the model is updated on a joint distribution of new data and limited kept samples from past.

When Flashback Learning is integrated into this setup:

Phase 1: A primary model is trained on the new task, and primary new features embeddings or logits are generated for the memory samples by the primary model.
Phase 2: The model is reinitialized to the old model and trained under a bidirectional replay—from both the old and primary logits or feature emebeddings—guiding the model responce towards a balance between past and new knowledge, improving stability and plasticity.

We integrate Flashback Learning into two representative memory replay methods:

Incremental Classifier and Representation Learning (iCaRL) is a replay method with herding strategy to select samples closest to each class prototype and use them while distilling knowledge from the old model at logits iCaRL – CVPR 2017. The FL-integrated version of iCaRL is implemented in ./models/fl-icarl.py.
eXtended Dark Exoerinece Replay (X-DER) is an extension to vanilla DER – NeurIPS 2020, from replay category, that keeps old logits in the memory buffer for distillation during rehearsal. We selected X-DER because it performs better than other DER variations X-DER – TPAMI 2022. The FL-integrated version of X-DER is implemented in ./models/fl-xder.py.

Parameter Regularization Methods

In parameter regularization methods, parameters and importnace matrix (Fisher information matrix) of old model are stored and brought to new task. When new task starts, the model is updated under a unidirectional regularization to learn new task and keep important old parameters.

When Flashback Learning is integrated into this setup:

Phase 1: A primary model is trained on the new task, its parameters and importance matrix are kept.
Phase 2: The model is reinitialized to the old model and trained under a bidirectional regularization—from both the old and primary parameters—guiding the model paremeters towards an interpolation between old and primary new parameters, improving stability and plasticity.

We integrate Flashback Learning into one representative parameter regularization methods:

online Elastic Weight Consolidation(oEWC) is a baseline from parameter-regularization category, which calculates old parameters' importance recursively and then apply it for weighted regularization on new parameters update. Progress and Compress – ICML 2018. The FL-integrated version of oEWC is implemented in ./models/fl-ewc_on.py.

Run Scripts

Running Baselines with and without Flashback Learning

To run any continual learning baseline with or without Flashback Learning integration, use the following command pattern:

python utils/main.py \
  --run [original | flashback] \
  --model <model_name> \
  --alpha_p <plasticity_loss_scaler> \
  --cl_arguments <arguments_specific_to_cl_baseline> \
  --epoch_base <epochs_for_task0> \
  --sch0 <use_scheduler_for_task0: 0|1> \
  --epoch_cl <phase1_epochs_for_tasks> \
  --sch <use_scheduler_for_phase1: 0|1> \
  --epoch_fl <phase2_epochs_for_tasks> \
  --schf <use_scheduler_for_phase2: 0|1> \
  --dataset <dataset_name> \
  --batch_size <batch_size> \
  --lr <learning_rate> \
  --optim_mom <optimizer_momentum> \
  --optim_wd <optimizer_weight_decay>

Description of Arguments

--run: Set to original for running the baseline only, or flashback to activate Flashback Learning (FL).
--model: Name of the model to run. It must be one of the models that support FL, defined in models/fl-model.py.
--alpha_p: Scaling factor for the plasticity loss component used in FL Phase 2.
--cl_arguments: Baseline-specific arguments required by the continual learning method (e.g., --e_lambda for EWC).
--epoch_base: Number of training epochs for the base task (task 0).
--sch0: Whether to apply a learning rate scheduler during training on the base task. Use 1 to enable or 0 to disable.
--epoch_cl: Number of epochs for Phase 1 (task > 0), which trains the primary model on the new task.
--sch: Whether to use a scheduler in Phase 1. Set to 1 to enable or 0 to disable.
--epoch_fl: Number of epochs for Phase 2 (FL), which flashbacks from the primary model to the old model.
--schf: Whether to apply a scheduler during Phase 2. Set to 1 or 0.
--dataset: Name of the dataset. Examples include seq-cifar10, seq-cifar100, or seq-tinyimagenet.
--batch_size: Mini-batch size for training.
--lr: Learning rate.
--optim_mom: Momentum parameter for the optimizer (e.g., SGD).
--optim_wd: Weight decay (L2 regularization) used by the optimizer.

Note: You can find ready-to-run scripts for all models supported by Flashback Learning (defined under models/) across all datasets in the scripts/ folder.

Acknowledgement

We gratefully acknowledge the contributions of the following repositories, which served as the foundation or inspiration for parts of this work:

Mammoth: A Flexible Framework for Continual Learning

We thank the authors of these projects for making their code publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backbone		backbone
datasets		datasets
docs		docs
models		models
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Flashback-Learning (FL)

Flashbacks to harmonize stability and plasticity in continual learning

Introduction

Setting Environment

Common Benchmarks

Data Download and Directory

Customizing Benchmarks

Running Baselines with and without Flashback Learning

Description of Arguments

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

csiro-robotics/Flashback-Learning

Folders and files

Latest commit

History

Repository files navigation

Flashback-Learning (FL)

Flashbacks to harmonize stability and plasticity in continual learning

Introduction

Setting Environment

Common Benchmarks

Data Download and Directory

Customizing Benchmarks

Running Baselines with and without Flashback Learning

Description of Arguments

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages