This library provides a comprehensive framework for experimenting with various optimization algorithms across different machine learning tasks. The library supports multiple datasets and models, with a special focus on optimization strategies.
- Overview
- Setup
- Project Structure
- Available Datasets
- Argument System
- Available Optimizers
- Arguments Reference
- Scripts
- Examples
This library allows researchers and practitioners to:
- 📊 Benchmark various optimization algorithms on standard datasets
- 🔄 Experiment with parameter-efficient fine-tuning strategies
- 📈 Compare performance across different tasks and models
- 🧩 Easily extend the framework with custom optimizers and models
You can create the required environment using the venv
and requirements.txt
files:
python -m venv optim_venv
source optim_venv/bin/activate
pip install -r requirements.txt
This will install all necessary dependencies including PyTorch, transformers, and other required libraries.
The project is organized into several key directories:
src/
- Core source codeconfig.py
- Main configuration parserlibsvm/
- LIBSVM datasets and modelscv/
- Computer Vision datasets and modelsfine_tuning/
- Fine-tuning strategies for pre-trained modelsoptimizers/
- Implementation of various optimization algorithms
scripts/
- Ready-to-use scripts for running experimentsdata/
- Default location for datasetsnotebooks/
- Example notebooks
The library supports the following dataset categories:
Standard datasets for binary and multi-class classification:
- mushrooms
- binary
- and other standard LIBSVM datasets
Image classification datasets:
- cifar10
- and other CV datasets
Datasets for natural language tasks:
- cola
- mnli
- mrpc
- qnli
- qqp
- rte
- sst2
- stsb
- wnli
- Various datasets for large language model fine-tuning
The library uses a hierarchical argument system:
- Base Arguments (
config.py
): Core arguments applicable to all experiments - Task-Specific Arguments: Extended arguments for specific tasks
- LIBSVM Arguments (
libsvm/config_libsvm.py
) - Computer Vision Arguments (
cv/config_cv.py
) - Fine-Tuning Arguments (
fine_tuning/config_ft.py
)
- LIBSVM Arguments (
Arguments are processed hierarchically. When running an experiment:
- Base arguments are loaded first
- Based on the selected dataset, task-specific arguments are added
- If a configuration file is specified with
--config_name
, its values override defaults
Optimizers are implemented as individual Python files. The library currently supports:
adamw
- AdamW optimizer with weight decaysoap
- SOAP (Second Order Approximation) optimizershampoo
- Shampoo optimizer for efficient second-order optimizationsgd
- Stochastic Gradient Descentmuon
- MUON optimizer
--dataset
: Dataset name (required)--config_name
: Name of the configuration file for your problem (optional)--optimizer
: Name of the optimizer to use (choices: adamw, soap, shampoo, sgd, adam-sania, muon)
--batch_size
/--per_device_train_batch_size
: Batch size for training (default: 8)--n_epoches_train
: How many epochs to train (default: 1)--eval_runs
: Number of re-training model with different seeds (default: 1)--dtype
: Default type for torch (default: None)--use_old_tune_params
: Use already tuned parameters (flag)
--wandb
: Enable Weights & Biases logging (flag)--run_prefix
: Run prefix for the experiment run name--wandb_project
: W&B project name (default: "OPTIM_TEST")--verbose
: Print training results in the terminal (flag)--seed
: Random seed (default: 18)
--results_path
: Path to save the results of the experiment (default: "results_raw")--data_path
: Path to save the datasets (default: "data")
--lr
/--learning_rate
: Learning rate (default: 1e-4)--weight_decay
/-wd
: Weight decay (default: 1e-5)
--beta1
: First momentum (default: 0.9)--beta2
: Second momentum (default: 0.999)--eps
: Epsilon for Adam (default: 1e-8)
--momentum
: First momentum (default: 0.9)
--shampoo_beta
: Momentum for SOAP. If -1, equals to beta2 (default: -1)
--update_freq
: Frequency to update Q for Shampoo and SOAP (default: 1)
--ns_steps
: Number of the NS steps algorithm (default: 10)--adamw_lr
: Learning rate for Adam in MUON (default: None)
--scale
: Use or not scaling for the LIBSVM datasets (flag)--scale_bound
: Scaling ~exp[U(-scale_bound, scale_bound)] (default: 20)--rotate
: Use or not rotating for the LIBSVM datasets (flag)
--model
: Model name (default: "linear-classifier", choices: ["linear-classifier"])--hidden_dim
: Hidden dimension of linear classifier (default: 10)--no_bias
: No bias in the FCL of the linear classifier (flag)--weight_init
: Initial weights of the linear classifier (default: "uniform", choices: ["zeroes", "uniform", "bad_scaled", "ones", "zero/uniform"])
The LIBSVM tasks set the following defaults:
batch_size = 128
n_epoches_train = 2
eval_runs = 3
dtype = "float64"
--tune
: Tune parameters with Optuna (flag)--n_epoches_tune
: How many epochs to tune with Optuna (default: 1)--tune_runs
: Number of Optuna steps (default: 20)--tune_path
: Path to save the tuned parameters (default: "tuned_params")
--not_augment
: Disable data augmentation (flag)
--model
: Model name (default: "resnet20", choices: ["resnet20", "resnet32", "resnet44", "resnet56"])
The CV tasks set the following defaults:
batch_size = 64
n_epoches_train = 10
eval_runs = 5
--tune
: Tune parameters with Optuna (flag)--n_epoches_tune
: How many epochs to tune with Optuna (default: 5)--tune_runs
: Number of Optuna steps (default: 100)--tune_path
: Path to save the tuned parameters (default: "tuned_params")
--dataset_config
: Dataset config name--dataset_path
: Path to dataset for LLM tasks--max_seq_length
: Maximum total input sequence length after tokenization (default: 128)--pad_to_max_length
: Pad all samples to max_seq_length (flag, default: True)--max_train_samples
: Truncate number of training examples--max_eval_samples
/--max_val_samples
: Truncate number of validation examples--max_test_samples
: Truncate number of test examples--train_file
: CSV or JSON file containing training data--validation_file
: CSV or JSON file containing validation data--test_file
: CSV or JSON file containing test data--preprocessing_num_workers
/--workers
: Number of processes for preprocessing--overwrite_cache
: Overwrite cached training and evaluation data (flag)
--model
: Path to pretrained model or HuggingFace model identifier--config
: Pretrained config name or path--cache_dir
: Where to store downloaded pretrained models--tokenizer
: Pretrained tokenizer name or path--padding_side
: Padding side for tokenization (default: "right", choices: ["left", "right"])--use_fast_tokenizer
: Use fast tokenizer (flag, default: True)--model_revision
: Specific model version (default: "main")--use_auth_token
: Use token from transformers-cli login (flag)--quant_bit
/--quantization_bit
: Number of bits for quantization
--do_not_train
: Skip training (flag)--do_not_eval
: Skip validation (flag)--do_predict
: Do prediction (flag)--eval_batch_size
/--per_device_eval_batch_size
: Batch size for evaluation (default: 32)--max_steps_train
/--max_train_steps
/--max_steps
: Maximum training steps (default: -1)--lr_scheduler_type
: Scheduler for optimizer (default: "linear")--grad_acc_steps
/--gradient_accumulation_steps
/--gradient_accumulation
: Gradient accumulation steps (default: 6)--warmup_steps
: Number of warmup steps (default: 100)--warmup_ratio
: Ratio of total steps for warmup (default: 0.1)--eval_strategy
/--evaluation_strategy
: Strategy to evaluate model (default: "epoch")--eval_steps
: Steps between evaluations when eval_strategy="steps"--logging_steps
: How often to print train loss (default: 1)--save_strategy
: Strategy to save checkpoints (default: "no")--save_steps
: Steps between saves when save_strategy="steps" (default: 500)--save_every
: Save model every N steps (default: 500)
--ft_strategy
: PEFT strategy to use (default: "LoRA")--lora_r
: Rank for LoRA adapters (default: 8)--lora_alpha
: Scaling of LoRA adapters (default: 32)--lora_dropout
: Dropout of LoRA adapters (default: 0.05)
Fine-tuning tasks set the following defaults:
batch_size = 8
n_epoches_train = 3
eval_runs = 1
dtype = "float16"
The scripts/
directory contains ready-to-use scripts for running common experiments. Make the scripts executable before using them:
chmod +x ./scripts/**/*.sh
Located at scripts/glue/deberta/
- lora.sh: Fine-tunes Microsoft DeBERTa-v3-base on GLUE tasks using LoRA
./scripts/glue/deberta/lora.sh
Located at scripts/glue/llama3/
- lora.sh: Fine-tunes Meta-Llama-3.1-8B on GLUE tasks using LoRA
./scripts/glue/llama3/lora.sh
Located at scripts/llm/
-
qwen.sh: Fine-tunes Qwen2-7B model on various LLM tasks using LoRA
./scripts/llm/qwen.sh [dataset_name]
Supported dataset names include: gsm8k, aqua, commonsensqa, boolq, mathqa, and more.
Example:
./scripts/llm/qwen.sh gsm8k
The main entry point for running experiments is src/run_experiment.py
. Here are some examples of how to use it:
python ./src/run_experiment.py \
--dataset mushrooms \
--optimizer adamw \
--lr 0.001 \
--weight_decay 0.01 \
--seed 42 \
--verbose
You can use JSON configuration files to set multiple parameters at once. For example, using libsvm/configs/basic.json
:
python ./src/run_experiment.py \
--dataset mushrooms \
--optimizer adamw \
--config_name basic
The configuration file basic.json
contains:
{
"batch_size": 128,
"n_epoches_train": 2,
"eval_runs": 3,
"n_epoches_tune": 1,
"tune_runs": 20,
"dtype": "float32"
}
python ./src/run_experiment.py \
--dataset sst2 \
--model bert-base-uncased \
--optimizer adamw \
--ft_strategy LoRA \
--lora_r 16 \
--batch_size 16 \
--eval_strategy steps \
--eval_steps 100 \
--wandb
python ./src/run_experiment.py \
--dataset cifar10 \
--model resnet56 \
--optimizer shampoo \
--update_freq 10 \
--n_epoches_train 20 \
--wandb
You can modify and run the provided scripts with custom parameters:
# First make scripts executable
chmod +x ./scripts/**/*.sh
# Run GLUE fine-tuning with DeBERTa
CUDA_VISIBLE_DEVICES=0 ./scripts/glue/deberta/lora.sh
# Run LLM fine-tuning with Qwen on the gsm8k dataset
./scripts/llm/qwen.sh gsm8k