Skip to content

Add CheXAgent model integration with tests and documentation #20886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions CHEXAGENT_IMPLEMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# CheXagent Implementation for vLLM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the contents of this file to the PR description

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thank you! I will


This document summarizes the implementation of CheXagent model support in vLLM, addressing the GitHub issue [#7863](https://github.com/vllm-project/vllm/issues/7863).

## Problem Statement

The original issue reported that CheXagent model was not supported by vLLM due to its integrated QFormer architecture. The error message was:
```
model architecture not supported by vllm
```

## Solution Overview

We implemented a complete CheXagent model support for vLLM by:

1. **Creating the model implementation** (`vllm/model_executor/models/chexagent.py`)
2. **Registering the model** in the model registry
3. **Adding test coverage** for the implementation
4. **Creating documentation** for usage

## Implementation Details

### 1. Model Architecture

The CheXagent implementation follows the same pattern as BLIP2, which also uses QFormer. The key components are:

- **Vision Model**: Uses BLIP vision encoder for medical image processing
- **QFormer**: Query-based transformer that bridges vision and language modalities
- **Language Model**: Generates medical text based on processed image features

### 2. Key Files Modified/Created

#### New Files:
- `vllm/model_executor/models/chexagent.py` - Main model implementation
- `vllm/tests/models/test_chexagent.py` - Test suite
- `vllm/docs/models/chexagent.md` - Usage documentation
- `vllm/test_chexagent_simple.py` - Simple validation script

#### Modified Files:
- `vllm/vllm/model_executor/models/registry.py` - Added CheXagent to `_MULTIMODAL_MODELS`
- `vllm/tests/models/registry.py` - Added CheXagent to `_MULTIMODAL_EXAMPLE_MODELS`

### 3. Model Components

#### QFormer Implementation
```python
class CheXagentQFormerModel(nn.Module):
"""QFormer model for processing vision features"""

class CheXagentQFormerMultiHeadAttention(nn.Module):
"""Multi-head attention for QFormer"""

class CheXagentQFormerLayer(nn.Module):
"""Single layer of QFormer"""
```

#### Main Model
```python
@MULTIMODAL_REGISTRY.register_processor(...)
class CheXagentForConditionalGeneration(nn.Module, SupportsMultiModal, SupportsPP, SupportsQuant):
"""Main CheXagent model for conditional generation"""
```

### 4. Registration

The model is registered in two places:

1. **Model Registry**: Maps `CheXagentForConditionalGeneration` to `("chexagent", "CheXagentForConditionalGeneration")`
2. **Multimodal Registry**: Registers the processor, processing info, and dummy inputs builder

## Usage

### Basic Usage
```python
from vllm import LLM, SamplingParams

llm = LLM(
model="StanfordAIMI/CheXagent-8b",
trust_remote_code=True,
dtype="auto"
)

prompt = "<image> Describe the findings in this chest X-ray."
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate([prompt], sampling_params, multi_modal_data={"image": [image_path]})
```

### API Usage
```python
import requests
import base64

data = {
"model": "StanfordAIMI/CheXagent-8b",
"prompt": "<image> Analyze this chest X-ray.",
"max_tokens": 512,
"temperature": 0.7,
"multi_modal_data": {"image": [encoded_image]}
}

response = requests.post("http://localhost:8000/v1/completions", json=data)
```

## Testing

### Running Tests
```bash
# Run the simple validation script
python test_chexagent_simple.py

# Run the full test suite
python -m pytest tests/models/test_chexagent.py -v
```

### Test Coverage
- Model import and initialization
- Registry registration
- Multimodal processor registration
- QFormer component functionality
- Image processing capabilities

## Configuration

The model supports standard vLLM configuration options:

- `num_query_tokens`: Number of query tokens for QFormer (default: 32)
- `vision_config`: Vision encoder configuration
- `qformer_config`: QFormer transformer configuration
- `text_config`: Language model configuration

## Medical Use Cases

CheXagent is specifically designed for:
- Chest X-ray analysis
- Medical report generation
- Medical image interpretation
- Medical education

## Limitations and Disclaimers

1. **Research Use Only**: This implementation is for research and educational purposes
2. **Not for Clinical Use**: Should not be used for actual clinical decision-making
3. **Image Quality**: Performance may vary with image quality and resolution
4. **Domain Specificity**: Optimized for medical images, particularly chest X-rays

## Technical Details

### QFormer Architecture
The QFormer implementation follows the standard transformer architecture with:
- Multi-head self-attention
- Cross-attention to vision features
- Feed-forward networks
- Layer normalization

### Vision Processing
- Uses BLIP vision encoder
- Supports both pixel values and pre-computed embeddings
- Handles batch processing of multiple images

### Language Model Integration
- Projects QFormer outputs to language model dimension
- Integrates with vLLM's multimodal embedding system
- Supports standard text generation features

## Future Improvements

1. **Performance Optimization**: Further optimize memory usage and inference speed
2. **Additional Medical Modalities**: Extend support for other medical imaging types
3. **Enhanced Medical Features**: Add specialized medical report templates
4. **Quantization Support**: Improve quantization compatibility

## Contributing

To contribute to this implementation:

1. Follow vLLM's coding standards
2. Add appropriate tests for new features
3. Update documentation as needed
4. Ensure backward compatibility

## References

- [Original GitHub Issue](https://github.com/vllm-project/vllm/issues/7863)
- [CheXagent Model](https://huggingface.co/StanfordAIMI/CheXagent-8b)
- [vLLM Documentation](https://docs.vllm.ai/)
- [BLIP2 Implementation](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/blip2.py)

## Conclusion

This implementation successfully addresses the original issue by providing full CheXagent model support in vLLM. The solution follows vLLM's established patterns and integrates seamlessly with the existing multimodal infrastructure. Users can now deploy CheXagent models for medical image analysis using vLLM's efficient inference engine.
Loading