GitHub - alibaba-damo-academy/RynnEC: RynnEC: Bringing MLLMs into Embodied World

RynnEC: Bringing MLLMs into Embodied World

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

RynnEC_demo.mp4

📰 News

[2025.08.08] 🔥🔥 Release our RynnEC-2B model, RynnEC-Bench and training code.

🌟 Introduction

RynnEC is a video multi-modal large language model (MLLM) specifically designed for embodied cognition tasks.

🛠️ Requirements and Installation

Basic Dependencies:

Python >= 3.10
Pytorch >= 2.4.0
CUDA Version >= 11.8
transformers >= 4.46.3

Install required packages:

git clone https://github.com/alibaba-damo-academy/RynnEC
cd RynnEC
pip install -e .
pip install flash-attn --no-build-isolation

🌎 Model Zoo

Model	Base Model	HF Link
RynnEC-2B	Qwen2.5-1.5B	Alibaba-DAMO-Academy/RynnEC-2B

CookBook

Checkout inference notebooks that demonstrate how to use RynnEC on various applications such as basic object understanding, spatial understanding and video object segmentation in egocentric world.

Notebooks	Description
Object Understanding	Demonstrates how to use RynnEC for general object recognition and understanding
Spatial Understanding	Demonstrations of using RynnEC for spatial understanding with 3D awareness
Video Object Segmentation	Demonstrations of using RynnEC for video object segmentation with text-based instructions

🤗 Demo

It is highly recommended to try our online demo first.

Otherwise, you can launch a gradio app locally:

python inference/gradio_demo.py --model-path Alibaba-DAMO-Academy/RynnEC-2B

options:
  --model-path MODEL_PATH, --model_path MODEL_PATH
  --port SERVER_PORT, --server_port SERVER_PORT
  	Optional. Port of the model server.

🕹️ RynnEC-Bench

RynnEC-Bench evaluates the models in two key areas: object cognition and spatial cognition, evaluating a total of 22 embodied cognitive abilities.

For more details, please refer to RynnEC-Bench.

🚀 Training

Step1: Prepare training data

To use our training code, please organize the annotation files in the following format:

[
    // image QA
    {
        "image": ["images/xxx.jpg"],
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nWhat are the colors of the bus in the image?"
            },
            {
                "from": "gpt",
                "value": "The bus in the image is white and red."
            },
            ...
        ]
    },
    // Video QA
    {
        "video": ["videos/xxx.mp4"],
        "conversations": [
            {
                "from": "human",
                "value": "<video>\nWhat are the main activities that take place in the video?"
            },
            {
                "from": "gpt",
                "value": "The main activities that take place in the video are the preparation of camera equipment by a man, a group of men riding a helicopter, and a man sailing a boat through the water."
            },
            ...
        ]
    },
    // Video-object QA (mp4 file)
    {
        "video": ["videos/xxx.mp4"],
        "conversations": [
            {
                "from": "human", 
                "value": "<video>\nWhat is the color of <region>?"
            }, 
            {
                "from": "gpt", 
                "value": "The color is red."
            }
        ],
        "masks": [
            {
                "frame id": {"size": [1080, 1920], "counts": "mask rle"},
                "frame id": {"size": [1080, 1920], "counts": "mask rle"}
            }
        ],
    },
    // Video-object QA (image files)
    {
        "video": ["videos/xxx/0.png", "videos/xxx/1.png", "videos/xxx/2.png", ...],
        "conversations": [
            {
                "from": "human", 
                "value": "<video>\nWhat is the color of <region>?"
            }, 
            {
                "from": "gpt", 
                "value": "The color is red."
            }
        ],
        "masks": [
            {
                "frame id": {"size": [1080, 1920], "counts": "mask rle"},
                "frame id": {"size": [1080, 1920], "counts": "mask rle"}
            }
        ],
        "mask_ids": ["the frame index of each mask in the video list"],
        "timestamps": ["timestamp of video frames"],
    },
    // Image-object QA
    {
        "video": ["images/xxx.jpg"],
        "conversations": [
            {
                "from": "human", 
                "value": "<video>\nWhat is the relationshipw between object1<region> and object2<region>?"
            }, 
            {
                "from": "gpt", 
                "value": "They are side by side."
            }
        ],
        "masks": [
            {"size": [1080, 1920], "counts": "mask rle"},
            {"size": [1080, 1920], "counts": "mask rle"}
        ],
    },
]

Step2: Prepare training script

We provide some templates in scripts/train for all stages. You can modify the variables to fit your settings of data and models based on them. For example:

  --data_folder ./datasets \
  --data_path stage4.json \
  --model_path Alibaba-DAMO-Academy/RynnEC-2B \
  --vision_encoder DAMO-NLP-SG/SigLIP-NaViT \

Step 3: Start training

Now you can start training with your training scripts:

# stage1
bash scripts/train/stage1.sh
# stage2
bash scripts/train/stage2.sh
...

Step 4: Merge LORA weights

If you use lora in the training stage, use the following command to merge the LORA weights after training:

python tools/merge_lora_weights.py --model_path checkpoints/stage4/checkpoint-xxx --save_path checkpoints/stage4_merge

✅ Evaluation

1.RynnEC-Bench

Please prepare the datasets and question files used for evaluation here.

# for object property cognition
bash scripts/eval/eval_object_property.sh

# for object segmentation
bash scripts/eval/eval_seg.sh

# for spatial cognition
bash scripts/eval/eval_spatial.sh

Note:

Fill in the API_KEY, URL in the metrics.py first.

📑 Citation

If you find RynnEC useful for your research and applications, please cite using this BibTeX:

👍 Acknowledgement

Our RynnEC is built on top of VideoLLaMA3. We also learned a lot from the implementation of VideoRefer, Sa2VA, and Qwen2VL. If your work is used in RynnEC but not mentioned in either this repo or the technical report, feel free to let us know ❤️.

🔒 License

This project is released under the Apache 2.0 license as found in the LICENSE file. The service is a research preview intended for non-commercial use ONLY, subject to the model Licenses of Qwen, Terms of Use of the data generated by OpenAI and Gemini, and Privacy Practices of ShareGPT. Please get in touch with us if you find any potential violations.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
benchmark		benchmark
evaluation		evaluation
inference		inference
rynnec		rynnec
scripts		scripts
third_parts/sam2		third_parts/sam2
tools		tools
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RynnEC: Bringing MLLMs into Embodied World

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📰 News

🌟 Introduction

🛠️ Requirements and Installation

🌎 Model Zoo

CookBook

🤗 Demo

🕹️ RynnEC-Bench

🚀 Training

Step1: Prepare training data

Step2: Prepare training script

Step 3: Start training

Step 4: Merge LORA weights

✅ Evaluation

1.RynnEC-Bench

📑 Citation

👍 Acknowledgement

🔒 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

alibaba-damo-academy/RynnEC

Folders and files

Latest commit

History

Repository files navigation

RynnEC: Bringing MLLMs into Embodied World

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📰 News

🌟 Introduction

🛠️ Requirements and Installation

🌎 Model Zoo

CookBook

🤗 Demo

🕹️ RynnEC-Bench

🚀 Training

Step1: Prepare training data

Step2: Prepare training script

Step 3: Start training

Step 4: Merge LORA weights

✅ Evaluation

1.RynnEC-Bench

📑 Citation

👍 Acknowledgement

🔒 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages