ComfyUI-JoyCaption

Joy Caption is a ComfyUI custom node powered by the LLaVA model for efficient, stylized image captioning. Caption Tools nodes handle batch image processing and automatic separation of caption text.

News & Updates

2025/08/22: Update ComfyUI-JoyCaption to v2.0.0 ( update.md )
- Added GGUF model support for better performance and compatibility
- Fixed critical parameter handling issues (top_k, image format)
- Improved console output (no more base64 spam)
- Enhanced error handling and stability
- Added JoyCaption GGUF and JoyCaption GGUF (Advanced) nodes
- Enhanced GGUF Support: Comprehensive support for 12 quantization levels (Q2_K to F16)
- llama_cpp_install folder: Complete installation guides and automated scripts for llama-cpp-python
- Simplified Installation: One-click llama-cpp-python installation with automatic CUDA support
- Cross-Platform Support: Windows, macOS, and Linux installation guides
- Performance Improvements: Optimized model loading and memory management
2025/06/15: Update ComfyUI-JoyCaption to v1.2.0 ( update.md )
- Enhanced CUDA performance optimization
- Improved memory management strategies
- Added Global Cache mode for faster processing

2025/06/07: Update ComfyUI-JoyCaption to v1.1.1 ( update.md )
2025/06/05: Update ComfyUI-JoyCaption to v1.1.0 ( update.md )
Add Caption tools

Features

Simple and user-friendly interface
Multiple caption types support
Flexible length control
Memory optimization options
GGUF model support - Efficient quantized models for better performance and lower memory usage
Automatic model download - Models will be downloaded automatically and properly renamed on first use
Support for multiple caption types
Support for advanced customization options
Multiple precision options (fp32, bf16, fp16, 8-bit, 4-bit)
Enhanced memory management with Global Cache mode
Clean console output (no debug spam)
Robust error handling and parameter validation
llama_cpp_install folder - Complete installation guides and automated scripts for llama-cpp-python

Installation

Clone this repository to your ComfyUI/custom_nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/1038lab/ComfyUI-JoyCaption.git

Install the required dependencies:

cd ComfyUI/custom_nodes/ComfyUI-JoyCaption
pip install -r requirements.txt

For GGUF Models (Recommended): Install llama-cpp-python with CUDA support:

# Option 1: Automated installation (Recommended)
python llama_cpp_install/llama_cpp_install.py

# Option 2: Manual installation
pip install -r requirements_gguf.txt

Installation Guides Available:

English: llama_cpp_install/llama_cpp_install.md
中文: llama_cpp_install/llama_cpp_install_zh.md

Download Models

The models will be automatically downloaded and renamed on first use, or you can manually download them:

Standard Models (HuggingFace Format)

Model	Link	Memory Usage
JoyCaption Beta One	Download	~8-16GB
JoyCaption Alpha Two	Download	~8-16GB

GGUF Models (Quantized, Recommended)

Model	Size	Memory Usage	Quality	Recommended For	Link
JoyCaption Beta One (Q2_K)	3.18GB	~4GB	Good	Low VRAM (6GB+)	Download
JoyCaption Beta One (Q3_K_S)	3.66GB	~5GB	Good+	Budget systems (8GB+)	Download
JoyCaption Beta One (Q3_K_M)	4.02GB	~5GB	Better	Balanced performance	Download
JoyCaption Beta One (Q3_K_L)	4.32GB	~6GB	Better+	Good quality/size ratio	Download
JoyCaption Beta One (IQ4_XS)	4.48GB	~6GB	Very Good	Recommended (8GB+)	Download
JoyCaption Beta One (Q4_K_S)	4.69GB	~6GB	Very Good	Quality focused	Download
JoyCaption Beta One (Q4_K_M)	4.92GB	~7GB	Very Good+	Balanced choice	Download
JoyCaption Beta One (Q5_K_S)	5.60GB	~7GB	Excellent	High quality (10GB+)	Download
JoyCaption Beta One (Q5_K_M)	5.73GB	~8GB	Excellent+	Premium quality	Download
JoyCaption Beta One (Q6_K)	6.60GB	~8GB	Near Original	Maximum quality (12GB+)	Download
JoyCaption Beta One (Q8_0)	8.54GB	~10GB	Original-	Full precision alternative	Download
JoyCaption Beta One (F16)	16.1GB	~18GB	Original	Full precision (24GB+)	Download

Alternative GGUF Sources

Model	Size	Source	Link
JoyCaption Beta One (Q4_K)	4.92GB	concedo	Download
JoyCaption Beta One (Q8_0)	8.54GB	concedo	Download
JoyCaption Beta One (F16)	16.1GB	concedo	Download

Note: GGUF models also require the vision projection model:

llama-joycaption-beta-one-llava-mmproj-model-f16.gguf

After downloading, place the model files in your ComfyUI/models/LLM/GGUF directory.

Basic Usage

Standard Nodes (HuggingFace Models)

Basic Node

Add the "JoyCaption" node from the 🧪AILab/📝JoyCaption category
Connect an image source to the node
Select the model file (defaults to llama-joycaption-beta-one-hf-llava)
Adjust the parameters as needed
Run the workflow

Advanced Node

Add the "JoyCaption (Advanced)" node from the 🧪AILab/📝JoyCaption category
Connect an image source to the node
Select the caption type
Adjust the parameters as needed
Run the workflow

GGUF Nodes (Recommended for Better Performance)

GGUF Basic Node

Add the "JoyCaption GGUF" node from the 🧪AILab/📝JoyCaption-GGUF category
Connect an image source to the node
Select the GGUF model (e.g., "JoyCaption Beta One (IQ4_XS)")
Choose processing mode (GPU/CPU)
Select caption style and length
Run the workflow

GGUF Advanced Node

Add the "JoyCaption GGUF (Advanced)" node from the 🧪AILab/📝JoyCaption-GGUF category
Connect an image source to the node
Configure all generation parameters (temperature, top_p, top_k, etc.)
Set custom prompts if needed
Run the workflow

Caption Tools

Image Batch Path 🖼️

This node allows you to load multiple images from a directory for batch processing:

Parameter	Description
image_dir	Directory path containing images
batch_size	Number of images to load (0 = all images)
start_from	Start from Nth image (1 = first image)
sort_method	Image loading order: sequential/reverse/random

Caption Saver 📝

This node saves generated captions to text files:

Parameter	Description
string	Caption text to save
image_path	Path to the image being captioned
image	(Optional) Image to save alongside the caption
custom_output_path	(Optional) Custom output directory path
custom_file_name	(Optional) Custom filename (without extension)
overwrite	If true, will overwrite existing files; if false, will add a number to make the filename unique

Parameters

Standard Nodes

Basic Node

Parameter	Description	Default	Range
Model	The JoyCaption model to use	`llama-joycaption-beta-one-hf-llava`	-
Memory Control	Memory optimization settings	Default	Default (fp32), Balanced (8-bit), Maximum Savings (4-bit)
Caption Type	Caption style selection	Descriptive	Descriptive, Descriptive (Casual), Straightforward, Tags, Technical, Artistic
Caption Length	Output length control	medium	any, very short, short, medium, long, very long

GGUF Nodes

GGUF Basic Node

Parameter	Description	Default	Range
Model	GGUF model to use	JoyCaption Beta One (IQ4_XS)	Q2_K, IQ4_XS, Q5_K_M, Q6_K
Processing Mode	Hardware acceleration	GPU	GPU, CPU
Caption Type	Caption style selection	Descriptive	Descriptive, Descriptive (Casual), Straightforward, Tags, Technical, Artistic
Caption Length	Output length control	any	any, very short, short, medium, long, very long
Memory Management	Model memory handling	Keep in Memory	Keep in Memory, Clear After Run, Global Cache

GGUF Advanced Node

Parameter	Description	Default	Range
Model	GGUF model to use	JoyCaption Beta One (IQ4_XS)	Q2_K, IQ4_XS, Q5_K_M, Q6_K
Processing Mode	Hardware acceleration	GPU	GPU, CPU
Max New Tokens	Maximum tokens to generate	512	1-2048
Temperature	Generation randomness	0.6	0.0-2.0
Top-p	Nucleus sampling	0.9	0.0-1.0
Top-k	Top-k sampling	0	0-100
Custom Prompt	Custom prompt template	""	Any text
Memory Management	Model memory handling	Keep in Memory	Keep in Memory, Clear After Run, Global Cache

Quantization Options

Standard Models (HuggingFace)

Mode	Precision	Memory Usage	Speed	Quality	Recommended GPU
Default	fp32	~16GB	1x	Best	24GB+
Default	bf16	~8GB	1.5x	Excellent	16GB+
Default	fp16	~8GB	2x	Very Good	16GB+
Balanced	8-bit	~4GB	2.5x	Good	12GB+
Maximum Savings	4-bit	~2GB	3x	Acceptable	8GB+

GGUF Models (Recommended)

Quantization	Model Size	Memory Usage	Speed	Quality	Recommended GPU
Q2_K	3.18GB	~4GB	4x	Good	6GB+
Q3_K_S	3.66GB	~5GB	3.5x	Good+	8GB+
Q3_K_M	4.02GB	~5GB	3.5x	Better	8GB+
Q3_K_L	4.32GB	~6GB	3x	Better+	8GB+
IQ4_XS	4.48GB	~6GB	3x	Very Good	8GB+
Q4_K_S	4.69GB	~6GB	3x	Very Good	8GB+
Q4_K_M	4.92GB	~7GB	2.5x	Very Good+	10GB+
Q5_K_S	5.60GB	~7GB	2.5x	Excellent	10GB+
Q5_K_M	5.73GB	~8GB	2.5x	Excellent+	12GB+
Q6_K	6.60GB	~8GB	2x	Near Original	12GB+
Q8_0	8.54GB	~10GB	1.8x	Original-	16GB+
F16	16.1GB	~18GB	1.5x	Original	24GB+

Note: GGUF models provide better performance and lower memory usage compared to standard models. IQ4_XS offers the best balance of quality and efficiency.

Advanced Node

Parameter	Description	Default	Range
Extra Options	Additional feature options	[]	Multiple options
Person Name	Name for person descriptions	""	Any text
Max New Tokens	Maximum tokens to generate	512	1-2048
Temperature	Generation temperature	0.6	0.0-2.0
Top-p	Sampling parameter	0.9	0.0-1.0
Top-k	Top-k sampling	0	0-100
Precision	Use fp32 for best quality, bf16 for balanced performance, fp16 for maximum speed. 8-bit and 4-bit quantization provide significant memory savings with minimal quality impact

Memory Management Options

Mode	Description	Recommended Use
Global Cache	Keeps model in memory for fastest processing	High VRAM GPUs (16GB+)
Keep in Memory	Maintains model in memory until workflow ends	Medium VRAM GPUs (8GB+)
Clear After Run	Clears model after each generation	Low VRAM GPUs (<8GB)

Setting Tips

Setting	Recommendation
Model Choice	Use GGUF models for better performance and lower memory usage. IQ4_XS offers the best balance of quality and efficiency
Memory Mode	Based on GPU VRAM: 24GB+ use Global Cache, 12GB+ use Keep in Memory, 8GB+ use Clear After Run
Processing Mode	Use GPU mode for faster processing, CPU mode for systems with limited VRAM
Input Resolution	The model works best with images of 512x512 or higher resolution
Memory Usage	If you encounter memory issues, try GGUF Q2_K model or use Clear After Run mode
Performance	For batch processing, consider using Global Cache mode with GGUF models if you have sufficient VRAM
Temperature	Lower values (0.6-0.7) for more stable results, higher values (0.8-1.0) for more diverse results
Top-k Parameter	Set to 0 to disable, or use values like 40-50 for more focused generation

About Model

This implementation uses LLaVA-based image captioning models, optimized to provide fast and accurate image descriptions.

Model features:

Free and open weights
Uncensored content coverage
Broad style diversity (digital art, photoreal, anime, etc.)
Fast processing with bfloat16 precision
High-quality caption generation
Memory-efficient operation
Consistent results across various image types
Support for multiple caption styles
Optimized for training diffusion models

The models are trained on diverse datasets, ensuring:

Balanced representation across different image types
High accuracy in various scenarios
Robust performance with complex scenes
Support for both SFW and NSFW content
Equal coverage of different styles and concepts

Roadmap

✅ Completed (v1.3.0)

✅ GGUF model support for better performance
✅ Support for 12 GGUF quantization formats (Q2_K to F16)
✅ Enhanced error handling and parameter validation
✅ Clean console output (no debug spam)
✅ Improved memory management options
✅ Batch processing support (Caption Tools)
✅ Performance optimizations for CPU processing (GGUF CPU mode)
✅ Enhanced batch processing features with memory management

🔄 Future Plans

More caption style options and custom prompts
Advanced fine-tuning support for custom models
Real-time processing optimizations
Multi-language caption support
Integration with more vision-language models

Credits

Original JoyCaption Model: fancyfeast - Creator of the base JoyCaption models
GGUF Quantized Models: mradermacher - Provider of comprehensive GGUF quantized models (Q2_K to F16)
Alternative GGUF Models: concedo - Provider of alternative GGUF models with vision projection
ComfyUI Integration: 1038lab - Developer of this ComfyUI custom node
LLaVA Framework: Microsoft Research - Base vision-language model architecture
llama-cpp-python: abetlen - Python bindings for llama.cpp

License

This repository's code is released under the GPL-3.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
example_workflows		example_workflows
llama_cpp_install		llama_cpp_install
locales		locales
CaptionTools.py		CaptionTools.py
JC.py		JC.py
JC_GGUF.py		JC_GGUF.py
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
__init__.py		__init__.py
jc_data.json		jc_data.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_gguf.txt		requirements_gguf.txt
update.md		update.md

Uh oh!

License

1038lab/ComfyUI-JoyCaption

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-JoyCaption

News & Updates

Features

Installation

Download Models

Standard Models (HuggingFace Format)

GGUF Models (Quantized, Recommended)

Alternative GGUF Sources

Basic Usage

Standard Nodes (HuggingFace Models)

Basic Node

Advanced Node

GGUF Nodes (Recommended for Better Performance)

GGUF Basic Node

GGUF Advanced Node

Caption Tools

Image Batch Path 🖼️

Caption Saver 📝

Parameters

Standard Nodes

Basic Node

GGUF Nodes

GGUF Basic Node

GGUF Advanced Node

Quantization Options

Standard Models (HuggingFace)

GGUF Models (Recommended)

Advanced Node

Memory Management Options

Setting Tips

About Model

Roadmap

✅ Completed (v1.3.0)

🔄 Future Plans

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Packages