ComfyUI-JoyCaption

Joy Caption is a ComfyUI custom node powered by the LLaVA model for efficient, stylized image captioning. Caption Tools nodes handle batch image processing and automatic separation of caption text.

Joycaption_node

News & Updates

v1 2 0

Joycaption_node

Features

Installation

  1. Clone this repository to your ComfyUI/custom_nodes directory:
    cd ComfyUI/custom_nodes
    git clone https://github.com/1038lab/ComfyUI-JoyCaption.git
    
  2. Install the required dependencies:
    cd ComfyUI/custom_nodes/ComfyUI-JoyCaption
    pip install -r requirements.txt
    
  3. For GGUF Models (Recommended): Install llama-cpp-python with CUDA support: ```bash

    Option 1: Automated installation (Recommended)

    python llama_cpp_install/llama_cpp_install.py

Option 2: Manual installation

pip install -r requirements_gguf.txt ```

Installation Guides Available:

Download Models

The models will be automatically downloaded and renamed on first use, or you can manually download them:

Standard Models (HuggingFace Format)

| Model | Link | Memory Usage | | —– | —- | ———— | | JoyCaption Beta One | Download | ~8-16GB | | JoyCaption Alpha Two | Download | ~8-16GB |

| Model | Size | Memory Usage | Quality | Recommended For | Link | | —– | —- | ———— | ——- | ————— | —- | | JoyCaption Beta One (Q2_K) | 3.18GB | ~4GB | Good | Low VRAM (6GB+) | Download | | JoyCaption Beta One (Q3_K_S) | 3.66GB | ~5GB | Good+ | Budget systems (8GB+) | Download | | JoyCaption Beta One (Q3_K_M) | 4.02GB | ~5GB | Better | Balanced performance | Download | | JoyCaption Beta One (Q3_K_L) | 4.32GB | ~6GB | Better+ | Good quality/size ratio | Download | | JoyCaption Beta One (IQ4_XS) | 4.48GB | ~6GB | Very Good | Recommended (8GB+) | Download | | JoyCaption Beta One (Q4_K_S) | 4.69GB | ~6GB | Very Good | Quality focused | Download | | JoyCaption Beta One (Q4_K_M) | 4.92GB | ~7GB | Very Good+ | Balanced choice | Download | | JoyCaption Beta One (Q5_K_S) | 5.60GB | ~7GB | Excellent | High quality (10GB+) | Download | | JoyCaption Beta One (Q5_K_M) | 5.73GB | ~8GB | Excellent+ | Premium quality | Download | | JoyCaption Beta One (Q6_K) | 6.60GB | ~8GB | Near Original | Maximum quality (12GB+) | Download | | JoyCaption Beta One (Q8_0) | 8.54GB | ~10GB | Original- | Full precision alternative | Download | | JoyCaption Beta One (F16) | 16.1GB | ~18GB | Original | Full precision (24GB+) | Download |

Alternative GGUF Sources

| Model | Size | Source | Link | | —– | —- | —— | —- | | JoyCaption Beta One (Q4_K) | 4.92GB | concedo | Download | | JoyCaption Beta One (Q8_0) | 8.54GB | concedo | Download | | JoyCaption Beta One (F16) | 16.1GB | concedo | Download |

Note: GGUF models also require the vision projection model:

After downloading, place the model files in your ComfyUI/models/LLM/GGUF directory.

Basic Usage

Standard Nodes (HuggingFace Models)

Basic Node

  1. Add the “JoyCaption” node from the 🧪AILab/📝JoyCaption category
  2. Connect an image source to the node
  3. Select the model file (defaults to llama-joycaption-beta-one-hf-llava)
  4. Adjust the parameters as needed
  5. Run the workflow

Advanced Node

  1. Add the “JoyCaption (Advanced)” node from the 🧪AILab/📝JoyCaption category
  2. Connect an image source to the node
  3. Select the caption type
  4. Adjust the parameters as needed
  5. Run the workflow

GGUF Basic Node

  1. Add the “JoyCaption GGUF” node from the 🧪AILab/📝JoyCaption-GGUF category
  2. Connect an image source to the node
  3. Select the GGUF model (e.g., “JoyCaption Beta One (IQ4_XS)”)
  4. Choose processing mode (GPU/CPU)
  5. Select caption style and length
  6. Run the workflow

GGUF Advanced Node

  1. Add the “JoyCaption GGUF (Advanced)” node from the 🧪AILab/📝JoyCaption-GGUF category
  2. Connect an image source to the node
  3. Configure all generation parameters (temperature, top_p, top_k, etc.)
  4. Set custom prompts if needed
  5. Run the workflow

Caption Tools

Image Batch Path 🖼️

This node allows you to load multiple images from a directory for batch processing:

Parameter Description
image_dir Directory path containing images
batch_size Number of images to load (0 = all images)
start_from Start from Nth image (1 = first image)
sort_method Image loading order: sequential/reverse/random

Caption Saver 📝

This node saves generated captions to text files:

Parameter Description
string Caption text to save
image_path Path to the image being captioned
image (Optional) Image to save alongside the caption
custom_output_path (Optional) Custom output directory path
custom_file_name (Optional) Custom filename (without extension)
overwrite If true, will overwrite existing files; if false, will add a number to make the filename unique

Parameters

Standard Nodes

Basic Node

| Parameter | Description | Default | Range | | ——— | ———– | ——- | —– | | Model | The JoyCaption model to use | llama-joycaption-beta-one-hf-llava | - | | Memory Control | Memory optimization settings | Default | Default (fp32), Balanced (8-bit), Maximum Savings (4-bit) | | Caption Type | Caption style selection | Descriptive | Descriptive, Descriptive (Casual), Straightforward, Tags, Technical, Artistic | | Caption Length | Output length control | medium | any, very short, short, medium, long, very long |

GGUF Nodes

GGUF Basic Node

| Parameter | Description | Default | Range | | ——— | ———– | ——- | —– | | Model | GGUF model to use | JoyCaption Beta One (IQ4_XS) | Q2_K, IQ4_XS, Q5_K_M, Q6_K | | Processing Mode | Hardware acceleration | GPU | GPU, CPU | | Caption Type | Caption style selection | Descriptive | Descriptive, Descriptive (Casual), Straightforward, Tags, Technical, Artistic | | Caption Length | Output length control | any | any, very short, short, medium, long, very long | | Memory Management | Model memory handling | Keep in Memory | Keep in Memory, Clear After Run, Global Cache |

GGUF Advanced Node

| Parameter | Description | Default | Range | | ——— | ———– | ——- | —– | | Model | GGUF model to use | JoyCaption Beta One (IQ4_XS) | Q2_K, IQ4_XS, Q5_K_M, Q6_K | | Processing Mode | Hardware acceleration | GPU | GPU, CPU | | Max New Tokens | Maximum tokens to generate | 512 | 1-2048 | | Temperature | Generation randomness | 0.6 | 0.0-2.0 | | Top-p | Nucleus sampling | 0.9 | 0.0-1.0 | | Top-k | Top-k sampling | 0 | 0-100 | | Custom Prompt | Custom prompt template | “” | Any text | | Memory Management | Model memory handling | Keep in Memory | Keep in Memory, Clear After Run, Global Cache |

Quantization Options

Standard Models (HuggingFace)

| Mode | Precision | Memory Usage | Speed | Quality | Recommended GPU | |——|———–|————–|——-|———|—————-| | Default | fp32 | ~16GB | 1x | Best | 24GB+ | | Default | bf16 | ~8GB | 1.5x | Excellent | 16GB+ | | Default | fp16 | ~8GB | 2x | Very Good | 16GB+ | | Balanced | 8-bit | ~4GB | 2.5x | Good | 12GB+ | | Maximum Savings | 4-bit | ~2GB | 3x | Acceptable | 8GB+ |

| Quantization | Model Size | Memory Usage | Speed | Quality | Recommended GPU | |————-|————|————–|——-|———|—————-| | Q2_K | 3.18GB | ~4GB | 4x | Good | 6GB+ | | Q3_K_S | 3.66GB | ~5GB | 3.5x | Good+ | 8GB+ | | Q3_K_M | 4.02GB | ~5GB | 3.5x | Better | 8GB+ | | Q3_K_L | 4.32GB | ~6GB | 3x | Better+ | 8GB+ | | IQ4_XS | 4.48GB | ~6GB | 3x | Very Good | 8GB+ | | Q4_K_S | 4.69GB | ~6GB | 3x | Very Good | 8GB+ | | Q4_K_M | 4.92GB | ~7GB | 2.5x | Very Good+ | 10GB+ | | Q5_K_S | 5.60GB | ~7GB | 2.5x | Excellent | 10GB+ | | Q5_K_M | 5.73GB | ~8GB | 2.5x | Excellent+ | 12GB+ | | Q6_K | 6.60GB | ~8GB | 2x | Near Original | 12GB+ | | Q8_0 | 8.54GB | ~10GB | 1.8x | Original- | 16GB+ | | F16 | 16.1GB | ~18GB | 1.5x | Original | 24GB+ |

Note: GGUF models provide better performance and lower memory usage compared to standard models. IQ4_XS offers the best balance of quality and efficiency.

Advanced Node

Parameter Description Default Range
Extra Options Additional feature options [] Multiple options
Person Name Name for person descriptions ”” Any text
Max New Tokens Maximum tokens to generate 512 1-2048
Temperature Generation temperature 0.6 0.0-2.0
Top-p Sampling parameter 0.9 0.0-1.0
Top-k Top-k sampling 0 0-100
Precision Use fp32 for best quality, bf16 for balanced performance, fp16 for maximum speed. 8-bit and 4-bit quantization provide significant memory savings with minimal quality impact    

Memory Management Options

Mode Description Recommended Use
Global Cache Keeps model in memory for fastest processing High VRAM GPUs (16GB+)
Keep in Memory Maintains model in memory until workflow ends Medium VRAM GPUs (8GB+)
Clear After Run Clears model after each generation Low VRAM GPUs (<8GB)

Setting Tips

Setting Recommendation
Model Choice Use GGUF models for better performance and lower memory usage. IQ4_XS offers the best balance of quality and efficiency
Memory Mode Based on GPU VRAM: 24GB+ use Global Cache, 12GB+ use Keep in Memory, 8GB+ use Clear After Run
Processing Mode Use GPU mode for faster processing, CPU mode for systems with limited VRAM
Input Resolution The model works best with images of 512x512 or higher resolution
Memory Usage If you encounter memory issues, try GGUF Q2_K model or use Clear After Run mode
Performance For batch processing, consider using Global Cache mode with GGUF models if you have sufficient VRAM
Temperature Lower values (0.6-0.7) for more stable results, higher values (0.8-1.0) for more diverse results
Top-k Parameter Set to 0 to disable, or use values like 40-50 for more focused generation

About Model

This implementation uses LLaVA-based image captioning models, optimized to provide fast and accurate image descriptions.

Model features:

The models are trained on diverse datasets, ensuring:

Roadmap

✅ Completed (v1.3.0)

🔄 Future Plans

Credits

License

This repository’s code is released under the GPL-3.0 License.