MAIR-Hub

MAIR-Hub (MAIR stands for Multimodal AI Resources.) is a central repository for Multimodal AI Resources. This hub serves as a comprehensive collection of tutorials, code examples and other assets related to multimodal AI research and applications.

Repository Structure

The following directories contain specialized resources for different aspects of multimodal AI:

Directory	Description
rl-tutorial	Reinforcement Learning tutorials, including RL experiments with step-by-step guidance for reproduction
speech-llm	Speech LLM training recipes, including Qwen-omni-like speech2speech model training, and LLM enhanced semi-supervised learning for Speech Foundational Models etc.
external-resources	Curated links to other valuable multimodal AI resources

RL-Tutorial

The rl-tutorial directory contains resources focused on reinforcement learning approaches in multimodal AI:

r1-zero: Tutorial of using the veRL framework to reproduce the reinforcement learning training process of DeepSeek-R1-Zero in the mathematics domain.
r1-like: Tutorial of using the openRLHF framework to reproduce the reinforcement learning training process of DeepSeek-R1 in the mathematics domain.
vlm-R1: Tutorial of using the veRL framework to train VLM models with reinforcement learning using both text and multimodal data to enhance reasoning capabilities in the mathematics domain.
cosyvoice_llm: A training recipe for using the veRL framework to conduct reinforcement learning experiments on the CosyVoice2 LLM in the speech-generation domain.
kdd_labs: Hands-on labs for KDD 2025 Tutorial session, including distilling reasoning abilities from DeepSeek-R1 into smaller models using NeMo 2.0 Framework and GRPO training tutorial using NeMo RL tutorials.
qwen-merge: A training recipe for creating multimodal reasoning models by concatenating vision encoders from Qwen2.5-VL with Qwen3.
async_dapo: An efficient asynchronous training recipe using the veRL framework with AgentLoop async rollout functionality and DAPO algorithm for reinforcement learning experiments on the mathematics domain. Features early stopping mechanisms, dynamic load balancing, and seamless reward computation for enhanced training efficiency.
text2sql_multiturn: A tutorial on using the veRL framework for multi-turn reinforcement learning training to significantly improve model performance on the Text2SQL task.

Speech-LLM

The speech-llm directory provides resources for training Speech LLMs:

qwen-omni-like: Recipe for training Qwen2.5-Omni-style speech-to-speech (S2S) models.
less_recipe: Recipe for finetuning Speech Foundational Models with LLM enhanced semi-supervised learning.

Speech-SR-SE

The speech-sr-se directory provides resources for training Speech Super-Resolution and Enhancement models:

flow_matching_sr_se: Best practice for training a joint speech super-resolution and enhancement model based on flow-matching.

External Resources

This section provides links to valuable external tutorials and resources related to multimodal AI:

Reasoning and Knowledge Distillation

Distilling DeepSeek R1 into Qwen: A tutorial demonstrating how to distill the reasoning abilities of DeepSeek R1 (a 671B parameter MoE model) into smaller models like Qwen using the NVIDIA NeMo 2.0 Framework. The repository includes notebooks for extracting reasoning data and training models with the distilled knowledge.

We are working on adding more tutorials and assets...

This project is licensed under the terms of the LICENSE file included in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
rl-tutorial		rl-tutorial
speech-llm		speech-llm
speech-sr-se/flow_matching_sr_se		speech-sr-se/flow_matching_sr_se
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAIR-Hub

Repository Structure

RL-Tutorial

Speech-LLM

Speech-SR-SE

External Resources

Reasoning and Knowledge Distillation

About

Uh oh!

Releases

Packages

Contributors 11

Uh oh!

Languages

License

nvidia-china-sae/mair-hub

Folders and files

Latest commit

History

Repository files navigation

MAIR-Hub

Repository Structure

RL-Tutorial

Speech-LLM

Speech-SR-SE

External Resources

Reasoning and Knowledge Distillation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 11

Uh oh!

Languages

Packages