EasyCL is a powerful and user-friendly toolkit for continual learning with large language models. It features one-click operation, automation, multimodal support, compatibility with most base LLMs, and support for multiple state-of-the-art continual learning methods.
EasyCL is developed based on LLaMA-Factory. We would like to express our gratitude to the LLaMA-Factory team for their excellent open-source work. Before using EasyCL, we recommend reading the LLaMA-Factory's README and documentation.
- Introduction
- Installation
- Implemented Methods
- Dataset Format Requirements
- Continual Learning Training
- Continual Learning Evaluation
- Workflow
- Benchmark Adaptation
- Known Issues and Resolutions
- License
EasyCL is developed based on the LLaMA Factory framework, focusing on continual learning methods for large language models. It provides a comprehensive suite of tools and methods to address the problem of catastrophic forgetting in sequential learning tasks. It supports training and evaluation for various base models and data in both single-modal and multi-modal scenarios.
The framework integrates a variety of state-of-the-art continual learning techniques designed specifically for language models, allowing researchers and practitioners to easily implement, compare, and develop new methods.
For detailed implementation of the continual learning workflow, see src/easycl/cl_workflow/README.md.
Important: Before installing EasyCL, please ensure that you have correctly installed LLaMA-Factory.
git clone https://github.com/ECNU-ICALK/EasyCL.git
cd EasyCL
pip install -e .
-
Elastic Weight Consolidation (EWC) - View Implementation - Overcoming catastrophic forgetting in neural networks
-
Learning Without Forgetting (LWF) - View Implementation - Learning without forgetting
-
Experience Replay - View Implementation - Experience replay for continual learning
-
LAMOL (Language Modeling for Lifelong Language Learning) - View Implementation - LAMOL: LAnguage MOdeling for Lifelong Language Learning
-
O-LoRA (Orthogonal subspace learning) - View Implementation - Orthogonal subspace learning for language model continual learning
-
Gradient Episodic Memory (GEM) - View Implementation - Gradient Episodic Memory for Continual Learning
-
I-LoRA (Interpolation-based LoRA) - View Implementation - Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning
-
CLMoE (Dual Momentum Mixture-of-Experts LoRA) - View Implementation - CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
-
MOE-LoRA (Mixture of Experts with LoRA) - View Implementation - CoIN: A Benchmark of Continual Instruction Tuning for Multimodal Large Language Models
-
ABSCL (ABSA LLM-CL) - View Implementation - Boosting Large Language Models with Continual Learning for Aspect-based Sentiment Analysis
-
Dynamic ConPet - View Implementation - ConPET: Continual Parameter-Efficient Tuning for Large Language Models
-
Self-Synthesized Rehearsal (SSR) - View Implementation - Mitigating catastrophic forgetting in large language models with self-synthesized rehearsal
-
Pseudo Replay - View Implementation - Experience replay for continual learning
For more details about the continual learning methods, see src/easycl/cl/README.md.
To use EasyCL, your datasets should conform to the LLaMA-Factory dataset format requirements:
The dataset_info.json file contains all available datasets. If you want to use a custom dataset, you must add a dataset description in the dataset_info.json
file and use the dataset by modifying the dataset: dataset_name
configuration.
Currently, we support datasets in both alpaca and sharegpt formats.
In instruction supervised fine-tuning, content from the instruction
column will be concatenated with content from the input
column as the human instruction, which means the human instruction will be instruction\ninput
. And content from the output
column will be the model's response.
If specified, content from the system
column will be used as the system prompt.
The history
column is a list of string pairs, representing instructions and responses in each round of historical conversations. Note that in instruction supervised fine-tuning, responses in historical conversations will also be used for model learning.
[
{
"instruction": "Human instruction (required)",
"input": "Human input (optional)",
"output": "Model response (required)",
"system": "System prompt (optional)",
"history": [
["First-round instruction (optional)", "First-round response (optional)"],
["Second-round instruction (optional)", "Second-round response (optional)"]
]
}
]
For data in the above format, the dataset description in dataset_info.json
should be:
"dataset_name": {
"file_name": "data.json",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output",
"system": "system",
"history": "history"
}
}
Compared to the alpaca format, the sharegpt format supports more role types, such as human, gpt, observation, function, etc. They form a list of objects presented in the conversations
column.
Note that human and observation must appear in odd positions, while gpt and function must appear in even positions.
[
{
"conversations": [
{
"from": "human",
"value": "Human instruction"
},
{
"from": "function_call",
"value": "Tool parameters"
},
{
"from": "observation",
"value": "Tool results"
},
{
"from": "gpt",
"value": "Model response"
}
],
"system": "System prompt (optional)",
"tools": "Tool descriptions (optional)"
}
]
For data in the above format, the dataset description in dataset_info.json
should be:
"dataset_name": {
"file_name": "data.json",
"formatting": "sharegpt",
"columns": {
"messages": "conversations",
"system": "system",
"tools": "tools"
}
}
EasyCL's evaluation process relies on the `dataset_info.json` file to locate and load required datasets. When running evaluation commands with `--cl_tasks <task_name>` (e.g., `--cl_tasks my_eval_task`), the evaluator will:
- Find Test Set: The evaluator looks for entries matching `<task_name>_test` (e.g., `my_eval_task_test`) or entries with key `<task_name>` and `split` field set to `"test"` in `dataset_info.json`. Test set is required for evaluation.
- Find Dev Set: If `n_shot > 0` is set in evaluation parameters (for few-shot evaluation), the evaluator similarly looks for entries matching `<task_name>_dev` (e.g., `my_eval_task_dev`) or entries with `split` field set to `"dev"`. Dev set is not required for zero-shot evaluation.
Example:
Suppose your `dataset_info.json` contains:
{
"my_eval_task_dev": {
"file_name": "my_data/my_eval_task_dev.json",
"split": "dev",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output"
}
},
"my_eval_task_test": {
"file_name": "my_data/my_eval_task_test.json",
"split": "test",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output"
}
}
}
When you run `easycl-cli cl_workflow --mode eval_only --eval_params <your_eval_config>.yaml` with `--cl_tasks my_eval_task` specified in the config:
- The evaluator will load `my_data/my_eval_task_test.json` as test set.
- If `--n_shot 5` is also specified in the config, the evaluator will load `my_data/my_eval_task_dev.json` and use the first 5 samples as few-shot examples.
Important Notes:
- Ensure that you define corresponding `test` set entries in `dataset_info.json` for each task that needs evaluation, with correct `file_name`.
- Define `dev` set entries if you need few-shot evaluation.
- The `file_name` paths should be relative to the directory containing `dataset_info.json` or the `data` directory in project root. The evaluator will look in `task_dir` (if specified) or `./data` directory first.
EasyCL streamlines the continual learning training process, offering a "one-click" experience that simplifies complex setups. We support training for most single/multi-modal base models and distributed training.
EasyCL implements distributed training using DeepSpeed with compatibility across different ZeRO stages. The table below shows the compatibility status of each continual learning method with various DeepSpeed ZeRO configurations.
Method | Single GPU | ZeRO-0 | ZeRO-1 | ZeRO-2 | ZeRO-2+Offload | ZeRO-3 | ZeRO-3+Offload |
---|---|---|---|---|---|---|---|
Elastic Weight Consolidation (EWC) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Learning Without Forgetting (LWF) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Experience Replay | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
LAMOL | ✅ | ✅ | ✅ | ✅ | ✅ | 🚫 | 🚫 |
O-LoRA (Orthogonal subspace learning) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Gradient Episodic Memory (GEM) | ✅ | ✅ | 🚫 | 🚫 | 🚫 | 🚫 | 🚫 |
I-LoRA (Interpolation-based LoRA) | ✅ | ✅ | ✅ | ✅ | ✅ | ||
MOE-LoRA (Mixture of Experts with LoRA) | ✅ | ✅ | ✅ | ✅ | ✅ | 🚫 | 🚫 |
ABSCL (ABSA LLM-CL) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Dynamic ConPet | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
CL-MoE | ✅ | ✅ | ✅ | ✅ | ✅ | 🚫 | 🚫 |
Self-Synthetic Rehearsal (SSR) | ✅ | ✅ | ✅ | ✅ | ✅ | 🚫 | 🚫 |
Pseudo Replay | ✅ | ✅ | ✅ | ✅ | ✅ | 🚫 | 🚫 |
Legend:
- ✅ Compatible
⚠️ Known issues (will be fixed in upcoming releases)- 🚫 Not compatible
Incompatibility Reasons:
- GEM: Requires gradient projection operations which are incompatible with ZeRO-2 and above due to the way gradients are partitioned across devices.
- LAMOL, SSR, Pseudo Replay: These methods require long sequence sample generation, which causes extremely high communication overhead when using ZeRO-3, making them impractical for this configuration.
- MOE-LoRA, CL-MoE: These methods use the original PEFT package implementation which can cause unexpected errors with certain base models when using ZeRO-3 configuration.
Known Issues:
- I-LoRA: When switching between EMA adapters, ZeRO-2 encounters backward propagation identification errors. This issue is currently being addressed and will be fixed in an upcoming release.
To enable distributed training with DeepSpeed, follow these steps:
-
Add DeepSpeed Configuration to Your Training Config:
deepspeed: ./path/to/your/deepspeed_config.json
-
Specify GPUs to Use: You can specify which GPUs to use either by setting environment variables or by prefixing your command:
Using environment variables:
export CUDA_VISIBLE_DEVICES=0,1,2,3 easycl-cli cl_workflow --mode train_only --train_params ./your_config.yaml
Or directly in the command:
CUDA_VISIBLE_DEVICES=0,1,2,3 easycl-cli cl_workflow --mode train_only --train_params ./your_config.yaml
-
Sample DeepSpeed Configurations: We provide several example DeepSpeed configurations in the
example/deepspeed_config
directory for different ZeRO stages and optimization levels. You can use these as starting points for your distributed training:ds_z0_config.json
- ZeRO Stage 0 configurationds_z2_config.json
- ZeRO Stage 2 configurationds_z2_offload_config.json
- ZeRO Stage 2 with CPU offloadingds_z3_config.json
- ZeRO Stage 3 configurationds_z3_offload_config.json
- ZeRO Stage 3 with CPU offloading
Example usage:
deepspeed: ./example/deepspeed_config/ds_z0_config.json
This section details how EasyCL handles the evaluation of models in continual learning scenarios and the metrics used to assess their performance.
EasyCL supports continual learning evaluation for both single-modal and multimodal tasks. The core of our evaluation process relies on determining the accuracy of the model's predictions. This is achieved by comparing the model's generated output against the reference (ground truth) outputs provided in the test dataset for each task.
Note: Our current framework exclusively supports evaluation based on "accuracy." Other evaluation metrics, such as those based on similarity scores (e.g., cosine similarity for embeddings) or Intersection over Union (IoU) for object detection/segmentation tasks, are not yet implemented.
The framework automatically calculates several standard continual learning metrics after the evaluation phase of a full_workflow
or an eval_only
(when evaluating multiple task checkpoints) run. These metrics help quantify a model's ability to learn new tasks while retaining knowledge of previously learned tasks, its tendency to forget, and its capacity to transfer knowledge.
The primary metrics computed are:
Measures the model's average accuracy across all
Where
Represents the average of the mean accuracies obtained at each step of the continual learning process. Specifically, after learning each task
Where
Measures how much the learning of new tasks negatively impacts the performance on previously learned tasks (i.e., forgetting). A higher positive value indicates better retention (less forgetting), while a negative value indicates significant forgetting.
Where
Measures how much knowledge gained from learning previous tasks helps in learning new, subsequent tasks. A positive value indicates beneficial forward transfer.
Where
To facilitate one-click training via the command line, we have implemented a Command-Line Interface (CLI) for training. You can use various modes for training and evaluation, and it will automatically set some required parameter mappings according to the settings in src/easycl/cl_workflow/cl_params_config.json
. We currently support four training workflows: Train Only, Evaluate Only, Train Then Evaluate, and Full Workflow (Train, Evaluate, Calculate Metrics). You can use the `--previewonly` flag to preview commands without running them, and use `clean_dirs` to automatically clean output paths before running commands.
easycl-cli cl_workflow --mode train_only --train_params ./example/train_examples/lora_example.yaml
Preview Result: Executes training commands sequentially for tasks defined in train_config.yaml
, applying parameter management between tasks.
easycl-cli cl_workflow --mode eval_only --eval_params ./example/eval_examples/lora_eval.yaml
Preview Result: Executes evaluation command(s) specified in eval_config.yaml
(e.g., evaluating a specific fine-tuned model on cl_tasks
).
easycl-cli cl_workflow --mode train_then_eval \
--train_params ./example/train_examples/lora_example.yaml \
--eval_params ./example/eval_examples/lora_eval.yaml
Preview Result: Executes training commands sequentially, then executes evaluation commands (evaluating base model and model after each task).
easycl-cli cl_workflow --mode full_workflow \
--train_params ./example/train_examples/lora_example.yaml \
--eval_params ./example/eval_examples/lora_eval.yaml
Preview Result: Executes training sequentially, then evaluates base/task models, and finally calculates and saves CL metrics (Last, Avg, BWT, FWT) to the evaluation output directory.
For detailed information about workflow configuration and CL metrics, see src/easycl/cl_workflow/README.md.
Our framework can automatically handle the training and evaluation of benchmarks, supporting switching between multiple task orders. This makes it easier to reproduce and compare the effects of different continual learning methods on standard datasets.
We currently support the following four commonly used benchmarks:
- LFPT5 - Lfpt5: A unified framework for lifelong few-shot language learning based on prompt tuning of t5
- Large Number of Tasks Benchmark (LNT) - Orthogonal subspace learning for language model continual learning
- ABSACL_ATSC (Aspect-based Sentiment Analysis Continual Learning) - Adapting bert for continual learning of a sequence of aspect sentiment classification tasks
- VQA_CL (Visual Question Answering in Continual Learning) - View Details
You can use the following command to run benchmark evaluations (Benchmark evaluation currently only supports running in full_workflow
mode):
easycl-cli cl_workflow --mode full_workflow \
--train_params ./example/train_examples/lora_example.yaml \
--eval_params ./example/eval_examples/lora_eval.yaml \
--benchmark ABSACL_ATSC --benchmark_order order1 --benchmark_dir ./benchmark/ABSACL_ATSC
Note:
- Before running a benchmark, ensure that the corresponding benchmark data is stored in the directory specified by `--benchmark_dir` as required.
- Each benchmark needs to maintain a `benchmark_info.json` file to register the benchmark name, define different task orders, and specify the dataset information required for each task.
- Datasets involved in the benchmark need to be registered in the `dataset_info.json` in the benchmark directory.
If you wish to use your own benchmark, please follow these steps:
- Prepare Datasets:
- Ensure your datasets conform to the Alpaca or ShareGPT format described in Dataset Format Requirements.
- Organize the data for each task separately.
- Organize Benchmark Directory:
- Create a new folder under the `benchmark` directory, named after your benchmark (e.g., `my_custom_benchmark`).
- Place the corresponding data files within this folder according to your task division.
- Register Dataset Information:
- In the `dataset_info.json` file at the project root, add descriptions for each dataset used in your benchmark. Refer to the examples in the Data Format section.
- Create `benchmark_info.json`:
- In your created benchmark directory (e.g., `benchmark/my_custom_benchmark`), create a `benchmark_info.json` file.
- In this file, define your benchmark name, different task orders, and specify the dataset names corresponding to each task within each order (these names should match those registered in `dataset_info.json`). You can refer to the structure of existing benchmarks (e.g., `benchmark/ABSACL_ATSC/benchmark_info.json`).
- Run Benchmark:
- You can now run your custom benchmark using the `easycl-cli` command with the `--benchmark ` and `--benchmark_dir ./benchmark/` arguments.
Note: If you find a bug, please raise an issue.
🚧 Known Issues / Upcoming Features
- [TODO] ILoRA triggers bugs during multi-GPU training with ZeRO-2 configuration.
- [Feature] Planning to add support for [New Method/Feature].
- Optimizing memory usage during [Specific Process].
✅ Resolved Issues / Completed Features
- [Resolved] Evaluation does not correctly depend on
dataset_info.json
(2025-04-19). - [Resolved] Evaluation uses overly strict generation parameters (e.g., MMLU parameters) (2025-04-19).
- [Resolved] Pseudo-replay method reads tokenized data instead of raw data (2025-04-20).
- [Resolved] Gradient Episodic Memory (GEM) causes out-of-memory errors (2025-04-20).
- [Resolved] Improved O-LoRA logic and fixed dimension mismatch issues (2025-04-20).
- [Resolved] Fixed issues related to pseudo-sample generation methods and checked parameter imports for all existing methods (2025-04-20).
- [Resolved] Updated and adapted multi-GPU logic (2025-04-28).
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.