Skip to content

yushan777/SUPIR-Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Customized Version of the Original SUPIR Project


Screenshot 1

Screenshot 2

Screenshot 3

Screenshot 3

Screenshot 3


  • Removed the heavy LLaVA implementation.
  • Added safetensors support.
  • Updated dependencies.
  • Replaced SoftMax with SDPA for default attention.
  • Removed use_linear_control_scale (linear_s_stage2) and use_linear_cfg_scale (linear_CFG) arguments.
    • Uses the start and end scale values to determine whether linear scaling will be used/have effect or not.
  • Renamed arguments to make settings a bit more intuitive (more alignment with kijai's SUPIR ComfyUI custom nodes)
    • spt_linear_CFG -> cfg_scale_start
    • s_cfg -> cfg_scale_end
    • spt_linear_s_stage2 -> control_scale_start
    • s_stage2 -> control_scale_end
  • Added --skip_denoise_stage argument to bypass the artifact removal preprocessing step that uses the specialized VAE denoise encoder. This usually ends up with the image slightly softened (before sampling stage) since you do not want artifacts to be considered detail to be enhanced. You might want to skip this step if your image is already high quality.
  • Refactor: Renamed symbol upsacle in original code to upscale
  • Moved CLIP paths to a yaml config file.
  • Exposed sampler_tile_size and sampler_tile_stride to make them overridable when using TiledRestoreEDMSampler
  • SUPIR Settings saved into PNGInfo metadata
  • Parallel processing for Tiled VAE encoding/decoding
  • Improved memory management. On each run, it clears unused GPU (VRAM), cleans up Python's leftover crap, and releases unused RAM back to the system (Linux only).

Processing Times (seconds) with Models Preloaded
VRAM Usage : ~12GB
Note: Performance will vary depending on system specs beyond the GPU (CPU speed, memory bandwidth, etc.), so treat this only as a rough guide.

GPU Model 1024×1024 2048×2048 3072×3072
H100 15 s 95 s 243 s
RTX Pro 6000 10 s 71 s 190 s
RTX 5090 14 s 97 s 254 s
RTX 4090 18 s 133 s 329 s
RTX 3090 26 s 206 s 560 s

I’ve found max upscale between 2048×2048 and 4096x4096 to be the sweet spot for refinement work. 4096x4096 can yield smoother results (depending on the image), but it will be a LOT slower.

When working with large but imperfect images (for example, 45MP+ negative scans that are old or grainy), I split them into 2048×2048 tiles. This lets me refine each section independently while still preserving fine detail. The trade-off is that each tile requires its own prompt, along with some careful blending in Photoshop. Using overlaps between tiles makes this process easier. While it adds extra manual work, the payoff is much greater control. You can adjust prompts to suit the unique details of each region, whether that is faces, textures, text, or backgrounds, instead of relying on a single global prompt that may not work well for the entire image.

Best for you to experiment, if you have the patience.


Installation

Prerequisites:

  • Python 3.12
  • Git

Clone repo

git clone https://github.com/yushan777/SUPIR-Demo.git
cd SUPIR-Demo

# For Linux only
chmod +x *.sh

Install Environment

# Linux
./install_linux_local.sh

# Linux (Vast.ai)
./install_vastai.sh

# Windows
install_win_local.bat

Download Models

You can download the models at the same time while the venv is being installed (in a separate terminal)

# Linux
./download_models.sh

# Windows
download_models.bat

Manually Downloading The Models

ℹ️ See more information

If you prefer to Download the models manually or in your own time below are the links.
Additionally, if you already have these models then you can simply symlink them to the locations to save on storage space.

SmolVLM-500M-Instruct

For captioning input image in the Gradio demo.

SUPIR Models

Unless you have more than 24GB of VRAM, you should download the FP16 variants FP16 Versions

FP32 Versions

CLIP Models

SDXL Model

There are two SUPIR model variants: v0Q and v0F.

  • SUPIR-v0Q The v0Q model (Quality) is trained on a wide range of degradations, making it robust and effective across varied real-world scenarios. However, this broad generalization comes at a cost—when applied to images with only mild degradation, v0Q might overcompensate, hallucinate or alter details that are already mostly intact. This behavior stems from its training bias toward assuming significant visual damage.

  • SUPIR-v0F In contrast, the v0F model (Fidelity) is specifically trained on lighter degradation patterns. Its Stage1 encoder is tuned to better preserve fine details and structure, resulting in restorations that are more faithful to the input when the degradation is minimal. As a result, v0F is the preferred choice for high-fidelity restoration where subtle preservation is more critical than aggressive enhancement.

  1. If necessary, edit Custom Path for Checkpoints. Otherwise leave these alone.
    * [options/SUPIR_v0.yaml] --> SDXL_CKPT, SUPIR_CKPT_Q, SUPIR_CKPT_F. 
    * [options/SUPIR_v0_tiled.yaml] --> SDXL_CKPT, SUPIR_CKPT_Q, SUPIR_CKPT_F. 
    

Gradio Demo

# Linux
source venv/bin/activate
python3 run_supir_gradio.py

# or you can start it with the bash script (contains the above two commands)
chmod +x launch_gradio.sh
./launch_gradio.sh

# =======================================
# Windows
venv\Scripts\activate.bat
python run_supir_gradio.py

Default Settings

Default Settings can be set in the file defaults.json. If it doesn't exist, just copy and rename defaults_example.json

CLI Demo

# for cli test
python3 run_supir.py --img_path 'input/bottle.png' --save_dir ./output --SUPIR_sign Q --upscale 2 --use_tile_vae --loading_half_params

python3 run_supir.py \
--img_path 'input/woman-low-res-sq.jpg' \
--save_dir ./output \
--SUPIR_sign Q \
--upscale 2 \
--seed 1234567891 \
--img_caption 'A woman has dark brown eyes, dark curly hair wearing a dark scarf on her head. She is wearing a black shirt on with a pattern on it. The wall behind her is brown and green.' \
--edm_steps=50 \
--s_churn=5 \
--cfg_scale_start=2.0 \
--cfg_scale_end=4.0 \
--control_scale_start=0.9 \
--control_scale_end=0.9 \
--loading_half_params \
--use_tile_vae

Tested on Linux Mint, WSL, and Windows 11. It seems to run faster under Linux.


Processing Times / Memory Usage

Sampler: TiledRestoreEDMSampler
Tiled VAE: True
Number of Workers: 1
Linux, 64GB RAM

Upscale 4090
Time
4090
VRAM
4080
Time
4080
VRAM
4070
Time
4070
VRAM
2x 111 secs 14.0GB 227 secs 13.7GB 301 secs 11.7GB
3x 315 secs 14.1GB 475 secs 13.8GB 652 secs 11.7GB
4x 606 secs 14.6GB 910 secs 13.9GB 1625 secs 11.7GB
5x 992 secs 15.0GB 1492 secs 14.6GB OOM OOM

Arguments

Argument Description
img_path Path to the input image. (required)
save_dir Directory to save the output.
SUPIR_sign Model type. Options: ['F', 'Q']
Default: 'Q'
Q model (Quality) Trained on diverse, heavy degradations, making it robust for real-world damage. However, it may overcorrect or hallucinate when used on lightly degraded images due to its bias toward severe restoration.
F model (Fidelity) Optimized for mild degradations, preserving fine details and structure. Ideal for high-fidelity tasks where subtle restoration is preferred over aggressive enhancement.
skip_denoise_stage Skips the VAE Denoiser Stage. Default: 'False'
Bypass the artifact removal preprocessing step that uses the specialized VAE denoise encoder. This usually ends up with the image slightly softened (if you inspected it at this stage). This is to avoid SUPIR treating low-res/compression artifacts as detail to be enhanced.
You may wish to skip this step if:
- 1) You want do do your own pre-processing OR
- 2) Input image is clean and free of low-res/compression artifacts or other degradations
     - Can sometimes make closeups of skin textures a bit unnatural.
sampler_mode Sampler choice. Options: ['TiledRestoreEDMSampler', 'RestoreEDMSampler']
Default: 'TiledRestoreEDMSampler' (uses less VRAM)
seed Random seed for reproducibility. Default: 1234
Use Upscale to.. If on, use Update to width and Update to height values for upscaling. If off, then Upscale by factor will be used.
Upscale to width Upscale input image width to specified dimension if Use Upscale to.. is on.
Minimum: 1024
Upscale to height Upscale input image height to specified dimension if Use Upscale to.. is on.
Minimum: 1024
Upscale by Upscale factor for the input image.
Default: 2
Upscaling of the input image is performed before the denoising and sampling stage.
Both dimensions are multiplied by the upscale value. If the smaller of the dimensions is still < 1024px, the image is further enlarged to minimum of
1024px (aspect ratio maintained).
*** Notes about Upscaling:
The reason for the minimum of 1024 is to give SDXL a comfortable working resolution. Note that dimensions are snapped to the nearest multiple
of 64. The sweet spot seems to be between 2x and 4x (1024x1024) or 4x and 8x (512x512). Beyond that, the quality begins to collapse.
The higher the scale factor, the slower the process.
min_size Minimum output resolution. Default: 1024
num_samples Number of images to generate per input. Default: 1
img_caption Specific caption for the input image.
Default: ''
This caption is combined with a_prompt.
a_prompt Additional positive prompt (appended to input caption).
Default:
Cinematic, High Contrast, highly detailed, taken using a Canon EOS R camera, hyper detailed photo - realistic maximum detail, 32k, Color Grading, ultra HD, extreme meticulous detailing, skin pore detailing, hyper sharpness, perfect without deformations.
n_prompt Negative prompt.
Default:
painting, oil painting, illustration, drawing, art, sketch, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth
edm_steps Number of diffusion steps. Default: 50
s_churn controls how much extra randomness is added during the process. This helps the model explore more options and avoid getting stuck on a limited result. Default: 5
0: No noise (deterministic)
1–5: Mild/moderate
6–10+: Strong
s_noise Scales s_churn noise strength. Default: 1.003
Slightly < 1: More stable
Slightly > 1: More variation
cfg_scale_start Prompt guidance strength start.
Default: 2.0
cfg_scale_end Prompt guidance strength end.
Default: 4
1.0: Weak (ignores prompt)
7.5: Strong (follows prompt closely)
If cfg_scale_start and cfg_scale_end have the same value, no scaling occurs. When these values differ, linear scheduling is applied from start to end. They can also be reversed for creative strategies.
CFG Sweep Enables a mode to test a range of CFG scale values. When checked, it will generate multiple images, each with a different CFG scale, starting from CFG Scale Start to CFG Scale End. The seed is fixed during the sweep to ensure comparability between images.
CFG Sweep Step The increment used to step from the start to the end CFG scale value during a sweep.
CFG Sweep Direction Defines how the start and end value pairs are varied during a sweep.
- Forward: The start value increases while the end value stays fixed. Example: 2/8 → 3/8 → 4/8 → 5/8 ...
- Backward: The end value decreases while the start value stays fixed. Example: 2/8 → 2/7 → 2/6 → 2/5 ...
Control Guidance Scale Guides how strongly the overall structure of the input image is preserved. The process moves from a start scale (at the beginning, with high noise) to an end scale (at the end, with low noise).
- Control Scale Start: Structural guidance strength at the beginning of the process. Lower values allow more creative freedom early on.
- Control Scale End: Structural guidance strength at the end of the process. Higher values ensure the final details conform closely to the original image.
- Example: start=0.0 / end=1.0 begins with high creativity (ignoring the original structure) and ends by strictly adhering to the original image's structure for the final result.
control_scale_start Structural guidance from input image, start strength. Default: 0.9
control_scale_end Structural guidance from input image, end strength. Default: 0.9
0.0: Disabled
0.1–0.5: Light
0.6–1.0: Balanced (default)
1.1–1.5+: Very strong
Same value = fixed. Different values = scheduled.
restoration_scale Early-stage restoration strength. Controls how strongly the model pulls the structure of the output image back toward the original image. Only applies during the early stages of sampling when the noise level is high.
Default: 0 (disabled).
color_fix_type Color adjustment method. Default: 'Wavelet'
Options: ['None', 'AdaIn', 'Wavelet']
loading_half_params Loads the SUPIR model weights in half precision (FP16).
Default: False
Reduces VRAM usage and increases speed at the cost of slight precision loss.
diff_dtype Precision to use for the diffusion model only.
Allows overriding default precision independently, unless loading_half_params is set.
Default: 'fp16'
Options: ['fp32', 'fp16', 'bf16']
ae_dtype Autoencoder precision.
Default: 'bf16'
Options: ['fp32', 'bf16']
use_tile_vae Enables tile-based encoding/decoding for memory efficiency with large images.
Default: False
encoder_tile_size Tile size when encoding (when use_tile_vae is enabled).
TileVAE code has recommended tile sizes based on available VRAM if a CUDA device is available.
Encoder:
- VRAM > 16GB: 3072
- VRAM > 12GB: 2048
- VRAM > 8GB: 1536
- VRAM <= 8GB: 960
- No GPU: 512
decoder_tile_size Tile size when encoding (when use_tile_vae is enabled).
TileVAE code has recommended tile sizes based on available VRAM if a CUDA device is available.
Decoder:
- VRAM > 30GB: 256
- VRAM > 16GB: 192
- VRAM > 12GB: 128
- VRAM > 8GB: 96
- VRAM <= 8GB: 64
- No GPU: 64
Number of Workers Number of parallel CPU processes for VAE encoding/decoding.
Improves speed on multi-core CPUs by efficiently preparing data for the GPU.
Default: 4
sampler_tile_size Tile size for TiledRestoreEDMSampler.
This is the size of each tile that the image is divided into during tiled sampling.
Example: tile_size of 128 → image is split into 128×128 pixel tiles.
sampler_tile_stride Tile stride for TiledRestoreEDMSampler.
Controls overlap between tiles during sampling.
Smaller tile_stride = more overlap, better blending, more compute.
Larger tile_stride = less overlap, faster, may cause seams.
Overlap = tile_size - tile_stride
Examples:
- tile_size = 128, stride = 64 → 64 px overlap.

Images from Pixabay
Original SUPIR Repository
Kijai's SUPIR Custom Nodes for ComfyUI

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published