🛠️ Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning

Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal

📌 TL;DR Video-Skill-CoT is a skill-aware CoT reasoning framework that constructs domain-specific multi-step rationales and trains expert modules for adaptive video understanding.

🔧 Setup

OpenAI/Gemini API Setup

Our Video-Skill-CoT is based on openai/gemini api, so you need to setup your Azure OpenAI/Gemini API config in the below files. You can set your own API infomation in ./skill_cot_generation/config.ini.

[openai]
azure_endpoint = your endpoint   
api_key = your key 
api_version = your version 
[gemini]
gemini_api_key = your gemini_api_key
gemini_application_credentials = your credentials

Download datasets

Please locate all downloaded datasets in the ./video_instruction_datasets directory. The data structure will like below:

./video_instruction_datasets
    ├── cinepile
    ├── ET_164k
    ├── VSI-Bench

🔩 Skill-CoT Generation

Based on above video understanding datasets, you can generate skill-cot as follows:

# [Step 1] Skill clustering 
python ./skill_cot_generation/clustering.py --dataset='cine'  

# [Step 2] Skill-CoT generation 
python ./skill_cot_generation/skill_cot_generation.py --dataset='cine' --mode='skill_cot'  

# [Step 3] Skill-CoT filtering   
python ./skill_cot_generation/filtering.py --dataset='cine'

📝 TODO List

Release Multi-LoRA training code

📚 BibTeX

💗 If you enjoy our Video-Skill-CoT and find some beneficial things, citing our paper would be the best support for us!

@article{lee2025videoskillcot,
  title={Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning},
  author={Lee, Daeun and Yoon, Jaehong and Cho, Jaemin and Bansal, Mohit},
  journal={arXiv preprint arXiv:2506.03525},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
image		image
skill_cot_generation		skill_cot_generation
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛠️ Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning

Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal

🔧 Setup

OpenAI/Gemini API Setup

Download datasets

🔩 Skill-CoT Generation

📝 TODO List

📚 BibTeX

About

Uh oh!

Releases

Packages

Languages

daeunni/Video-Skill-CoT

Folders and files

Latest commit

History

Repository files navigation

🛠️ Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning

Daeun Lee*, Jaehong Yoon*, Jaemin Cho, Mohit Bansal

🔧 Setup

OpenAI/Gemini API Setup

Download datasets

🔩 Skill-CoT Generation

📝 TODO List

📚 BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal

Packages