-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Add CheXAgent model integration with tests and documentation #20886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
WeiqiangLv
wants to merge
1
commit into
vllm-project:main
Choose a base branch
from
WeiqiangLv:chexagent-integration
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,567
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,190 @@ | ||
# CheXagent Implementation for vLLM | ||
|
||
This document summarizes the implementation of CheXagent model support in vLLM, addressing the GitHub issue [#7863](https://github.com/vllm-project/vllm/issues/7863). | ||
|
||
## Problem Statement | ||
|
||
The original issue reported that CheXagent model was not supported by vLLM due to its integrated QFormer architecture. The error message was: | ||
``` | ||
model architecture not supported by vllm | ||
``` | ||
|
||
## Solution Overview | ||
|
||
We implemented a complete CheXagent model support for vLLM by: | ||
|
||
1. **Creating the model implementation** (`vllm/model_executor/models/chexagent.py`) | ||
2. **Registering the model** in the model registry | ||
3. **Adding test coverage** for the implementation | ||
4. **Creating documentation** for usage | ||
|
||
## Implementation Details | ||
|
||
### 1. Model Architecture | ||
|
||
The CheXagent implementation follows the same pattern as BLIP2, which also uses QFormer. The key components are: | ||
|
||
- **Vision Model**: Uses BLIP vision encoder for medical image processing | ||
- **QFormer**: Query-based transformer that bridges vision and language modalities | ||
- **Language Model**: Generates medical text based on processed image features | ||
|
||
### 2. Key Files Modified/Created | ||
|
||
#### New Files: | ||
- `vllm/model_executor/models/chexagent.py` - Main model implementation | ||
- `vllm/tests/models/test_chexagent.py` - Test suite | ||
- `vllm/docs/models/chexagent.md` - Usage documentation | ||
- `vllm/test_chexagent_simple.py` - Simple validation script | ||
|
||
#### Modified Files: | ||
- `vllm/vllm/model_executor/models/registry.py` - Added CheXagent to `_MULTIMODAL_MODELS` | ||
- `vllm/tests/models/registry.py` - Added CheXagent to `_MULTIMODAL_EXAMPLE_MODELS` | ||
|
||
### 3. Model Components | ||
|
||
#### QFormer Implementation | ||
```python | ||
class CheXagentQFormerModel(nn.Module): | ||
"""QFormer model for processing vision features""" | ||
|
||
class CheXagentQFormerMultiHeadAttention(nn.Module): | ||
"""Multi-head attention for QFormer""" | ||
|
||
class CheXagentQFormerLayer(nn.Module): | ||
"""Single layer of QFormer""" | ||
``` | ||
|
||
#### Main Model | ||
```python | ||
@MULTIMODAL_REGISTRY.register_processor(...) | ||
class CheXagentForConditionalGeneration(nn.Module, SupportsMultiModal, SupportsPP, SupportsQuant): | ||
"""Main CheXagent model for conditional generation""" | ||
``` | ||
|
||
### 4. Registration | ||
|
||
The model is registered in two places: | ||
|
||
1. **Model Registry**: Maps `CheXagentForConditionalGeneration` to `("chexagent", "CheXagentForConditionalGeneration")` | ||
2. **Multimodal Registry**: Registers the processor, processing info, and dummy inputs builder | ||
|
||
## Usage | ||
|
||
### Basic Usage | ||
```python | ||
from vllm import LLM, SamplingParams | ||
|
||
llm = LLM( | ||
model="StanfordAIMI/CheXagent-8b", | ||
trust_remote_code=True, | ||
dtype="auto" | ||
) | ||
|
||
prompt = "<image> Describe the findings in this chest X-ray." | ||
sampling_params = SamplingParams(temperature=0.7, max_tokens=512) | ||
outputs = llm.generate([prompt], sampling_params, multi_modal_data={"image": [image_path]}) | ||
``` | ||
|
||
### API Usage | ||
```python | ||
import requests | ||
import base64 | ||
|
||
data = { | ||
"model": "StanfordAIMI/CheXagent-8b", | ||
"prompt": "<image> Analyze this chest X-ray.", | ||
"max_tokens": 512, | ||
"temperature": 0.7, | ||
"multi_modal_data": {"image": [encoded_image]} | ||
} | ||
|
||
response = requests.post("http://localhost:8000/v1/completions", json=data) | ||
``` | ||
|
||
## Testing | ||
|
||
### Running Tests | ||
```bash | ||
# Run the simple validation script | ||
python test_chexagent_simple.py | ||
|
||
# Run the full test suite | ||
python -m pytest tests/models/test_chexagent.py -v | ||
``` | ||
|
||
### Test Coverage | ||
- Model import and initialization | ||
- Registry registration | ||
- Multimodal processor registration | ||
- QFormer component functionality | ||
- Image processing capabilities | ||
|
||
## Configuration | ||
|
||
The model supports standard vLLM configuration options: | ||
|
||
- `num_query_tokens`: Number of query tokens for QFormer (default: 32) | ||
- `vision_config`: Vision encoder configuration | ||
- `qformer_config`: QFormer transformer configuration | ||
- `text_config`: Language model configuration | ||
|
||
## Medical Use Cases | ||
|
||
CheXagent is specifically designed for: | ||
- Chest X-ray analysis | ||
- Medical report generation | ||
- Medical image interpretation | ||
- Medical education | ||
|
||
## Limitations and Disclaimers | ||
|
||
1. **Research Use Only**: This implementation is for research and educational purposes | ||
2. **Not for Clinical Use**: Should not be used for actual clinical decision-making | ||
3. **Image Quality**: Performance may vary with image quality and resolution | ||
4. **Domain Specificity**: Optimized for medical images, particularly chest X-rays | ||
|
||
## Technical Details | ||
|
||
### QFormer Architecture | ||
The QFormer implementation follows the standard transformer architecture with: | ||
- Multi-head self-attention | ||
- Cross-attention to vision features | ||
- Feed-forward networks | ||
- Layer normalization | ||
|
||
### Vision Processing | ||
- Uses BLIP vision encoder | ||
- Supports both pixel values and pre-computed embeddings | ||
- Handles batch processing of multiple images | ||
|
||
### Language Model Integration | ||
- Projects QFormer outputs to language model dimension | ||
- Integrates with vLLM's multimodal embedding system | ||
- Supports standard text generation features | ||
|
||
## Future Improvements | ||
|
||
1. **Performance Optimization**: Further optimize memory usage and inference speed | ||
2. **Additional Medical Modalities**: Extend support for other medical imaging types | ||
3. **Enhanced Medical Features**: Add specialized medical report templates | ||
4. **Quantization Support**: Improve quantization compatibility | ||
|
||
## Contributing | ||
|
||
To contribute to this implementation: | ||
|
||
1. Follow vLLM's coding standards | ||
2. Add appropriate tests for new features | ||
3. Update documentation as needed | ||
4. Ensure backward compatibility | ||
|
||
## References | ||
|
||
- [Original GitHub Issue](https://github.com/vllm-project/vllm/issues/7863) | ||
- [CheXagent Model](https://huggingface.co/StanfordAIMI/CheXagent-8b) | ||
- [vLLM Documentation](https://docs.vllm.ai/) | ||
- [BLIP2 Implementation](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/blip2.py) | ||
|
||
## Conclusion | ||
|
||
This implementation successfully addresses the original issue by providing full CheXagent model support in vLLM. The solution follows vLLM's established patterns and integrates seamlessly with the existing multimodal infrastructure. Users can now deploy CheXagent models for medical image analysis using vLLM's efficient inference engine. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the contents of this file to the PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thank you! I will