X-SAM

From Segment Anything to Any Segmentation

Hao Wang^1,2,Limeng Qiao³,Zequn Jie³, Zhijian Huang¹, Chengjian Feng³,

Qingfang Zheng¹, Lin Ma³, Xiangyuan Lan²^📧, Xiaodan Liang^1,2^📧

¹ Sun Yat-sen University, ² Pengcheng Lab, ³ Meituan Inc

^📧 Corresponding author.

🔥 Updates

2025-07-24: We release the Demo of X-SAM.

🚀 Introduction

This project provides the official PyTorch implementation of X-SAM.

X-SAM is novel unified segmentation MLLMs, which offers superior performance on all image segmentation benchmarks.
X-SAM integrates the SAM into MLLMs via a unified formulation adapted to all image segmentation, extending the SAM's capability from segment anything to any segmentation.
X-SAM co-trains on multi data sources via a effective multi-stage training strategy, achieving the robust performance across all tasks.

This project provides awesome code for segmentation MLLMs:

Training code for segmentation MLLMs.
Evaluation code for all image segmentation benchmarks.
Visualization code for segmentation MLLMs.
Training code for LLaVA-based MLLMs (based on XTuner).
Evaluation code for all VLM benchmarks (based on VLMEvalKit).

If you have any questions, please feel free to open an issue.

📄 Abstract

The Segment Anything Model (SAM) has emerged as a pivotal advancement in computer vision, particularly within the context of visual-prompt-driven segmentation. However, SAM is constrained by intrinsic limitations in multi-mask prediction and category-specific image segmentation tasks. Concurrently, Large Language Models (LLMs) have exhibited remarkable proficiency in comprehensive knowledge representation across a wide range of domains, yet they inherently lack the capacity for pixel-level perceptual understanding. To bridge these complementary gaps, we present X-SAM, a streamlined Multimodal Large Language Model (MLLM) framework that seamlessly integrates SAM with LLMs, thereby extending SAM's capabilities from segment anything to any segmentation. Specifically, we introduce a novel approach for integrating SAM with MLLMs, which facilitates more advanced dense, pixel-level perceptual comprehension within MLLMs. Furthermore, we propose a new segmentation paradigm, termed Visual GrounDed (VGD) segmentation, which empowers MLLMs with visual grounded, pixel-wise interpretative capabilities. To enable effective training of MLLMs on diverse data sources, we devise a unified training strategy that supports co-training across multiple datasets. Experimental results demonstrate that X-SAM achieves state-of-the-art performance on a wide range of image segmentation benchmarks, highlighting its efficiency for multimodal pixel-level visual understanding.

🔍 Overview

📊 Benchmark Results

Please refer to the benchmark results for more details.

✅ TODO

Release the Demo.
Release the weight.
Release the code and instructions for demo.
Release the code for evaluation on all segmentation benchmarks.
Release the code for evaluation on all VLM Benchmarks.
Release the code for training LLaVA-based MLLMs.
Release the code for training X-SAM (More than 500 🌟).

😊 Acknowledge

This project has referenced some excellent open-sourced repositories: XTuner, VLMEvalKit, Sa2VA. Thanks for their wonderful works and contributions to the community.

📌 Citation

If you find X-SAM helpful for your research or applications, please consider giving us a star 🌟 and citing it using the following BibTeX entry.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datas		datas
docs		docs
inits		inits
wkdrs		wkdrs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

X-SAM

From Segment Anything to Any Segmentation

🔥 Updates

🚀 Introduction

📄 Abstract

🔍 Overview

📊 Benchmark Results

✅ TODO

😊 Acknowledge

📌 Citation

About

Uh oh!

Releases

Packages

wanghao9610/X-SAM

Folders and files

Latest commit

History

Repository files navigation

X-SAM

From Segment Anything to Any Segmentation

🔥 Updates

🚀 Introduction

📄 Abstract

🔍 Overview

📊 Benchmark Results

✅ TODO

😊 Acknowledge

📌 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages