Multi-Modal Reward Model (MMRM)

This repository contains a pipeline of training reward model for multi-modal preference learning, built on top of the Qwen2-VL-2B-Instruct model.

Overview

Reward models are critical components in aligning large language models with human preferences. This project explores innovative architectures to enhance the performance of multi-modal reward models, particularly for image-text understanding tasks.

More details about this repo is in the "report.pdf" document

Training & Evaluation

The models were trained on the MMPR (Multi-Modal Preference Ranking) dataset using LoRA fine-tuning with the following settings:

Batch size: 1
Gradient accumulation steps: 16
Learning rate: 5e-6
Training epochs: 3
Optimizer: AdamW
LR scheduler: Cosine annealing
Mixed precision: FP16
LoRA rank: 8
LoRA alpha: 16
LoRA dropout: 0.05

For detailed steps on how to run it, you can look at the individual bash files in the first level directory

Requirements

torch>=2.0.0
transformers>=4.36.0
accelerate>=0.25.0
datasets>=2.14.0
peft>=0.6.0
trl>=0.7.4
wandb
pillow
matplotlib
tqdm

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
eval		eval
models		models
results		results
scripts		scripts
train		train
README.md		README.md
report.pdf		report.pdf
requirements.txt		requirements.txt
run_basic_config.sh		run_basic_config.sh
run_evaluate.sh		run_evaluate.sh
run_evaluate_lora.sh		run_evaluate_lora.sh
run_train_mmpr_full.sh		run_train_mmpr_full.sh
run_train_mmpr_test.sh		run_train_mmpr_test.sh
run_train_mmpr_wandb.sh		run_train_mmpr_wandb.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Modal Reward Model (MMRM)

Overview

Training & Evaluation

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DjangoJungle/MM-Reward

Folders and files

Latest commit

History

Repository files navigation

Multi-Modal Reward Model (MMRM)

Overview

Training & Evaluation

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages