Skip to content

xiaohaochen0308/LLM_seq_cls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

LoRA Fine-Tuning for Qwen2.5-7B-Instruct (Text Classification & Ad Title Ranking)

This project focuses on fine-tuning the Qwen2.5-7B-Instruct model using Low-Rank Adaptation (LoRA) for two key scenarios:

  1. Text Classification: Sentiment analysis, topic classification, etc.
  2. Text Ranking: Learning preference between high-performing and low-performing advertisement titles using pairwise ranking.

By integrating PyTorch and the Transformers library, along with pretrained models from the ModelScope platform, this project enables efficient and scalable downstream adaptation of large language models (LLMs).


🧠 Features

  • 🔍 Two Task Modes: Choose between classification and ranking with minimal code changes.
  • Efficient Training: LoRA enables fine-tuning large models with minimal resources.
  • 🧱 Modular Structure: Easy-to-extend code for your own classification or ranking use case.
  • Compatible with Qwen2.5 series from ModelScope.

📦 Setup

pip install torch transformers peft modelscope

🏁 Quick Start

Text Classification

cd Text_Classification/
python lora_cls.py

Text Ranking (Pairwise)

cd Text_Ranking/
python train.py

📊 Results

Text Classification (on N24News Abstract)

Model Accuracy (Acc)
Qwen2.5-7B-Instruct (LoRA) 85.26%
Bert 78.3%
RoBerta 79.7%

📁 Project Structure

Qwen_LoRA_Project/
├── README.md                          # Project documentation
├── Text_Classification/              # Text classification module
│   ├── data.py                       # Loads and processes classification data
│   ├── lora_cls.py                   # LoRA fine-tuning and training script
│   ├── train_trans_abs.json          # Training dataset (abstract text)
│   └── test_trans_abs.json           # Test dataset (abstract text)
│
└── Text_Ranking/                     # Ad title ranking model (CTR preference ranking)
    ├── data.py                       # Loads pairwise ad title data
    ├── model.py                      # Defines the TitleScorer model (Qwen + LoRA)
    └── train.py                      # Training and evaluation script for ranking model

🧪 Dataset Format

For Ranking

{
  "chosen": "<title>高点击标题</title>",
  "rejected": "<title>低点击标题</title>"
}

For Classification

{
  "messages": [{"role": "user", "content": "新闻文本"}],
  "label": 3
}

🛠️ Customization

To train with your own dataset:

  • For classification: prepare JSON with messages and label.
  • For ranking: prepare JSONL with chosen and rejected titles (wrapped in <title> tag).

📜 License

This project is released under the MIT License.

🙏 Acknowledgements

About

Using the Qwen-2.5 model for text classification (lora)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages