A custom ComfyUI node that implements the ByteDance Sa2VA model, enabling video captioning and segmentation capabilities within ComfyUI.
This extension integrates Sa2VA into ComfyUI, allowing you to generate detailed descriptions of video frames. Sa2VA-8B is a multimodal model developed by ByteDance that can understand video content and generate natural language descriptions.
WIP:
- Add a node that can take a gif
- Add node that implements segmentation
- Process sequences of images to generate detailed captions
- Customizable prompting to guide the model's description
- Seamless integration with ComfyUI workflow
- GPU-accelerated inference with Flash Attention support
- ComfyUI installation
- Open ComfyUI Manager
- Search for "Sa2VAWrapper"
- Click Install
WORKDIR /comfyui/custom_nodes
RUN git clone https://github.com/pablerdo/ComfyUI-Sa2VAWrapper.git --recursive
WORKDIR /comfyui/custom_nodes/ComfyUI-Sa2VAWrapper
RUN git reset --hard (commit hash)
RUN if [ -f requirements.txt ]; then python -m pip install -r requirements.txt; fi
RUN if [ -f install.py ]; then python install.py || echo "install script failed"; fi