GitHub - yushan777/SUPIR-Demo

A Customized Version of the Original SUPIR Project

Removed the heavy LLaVA implementation.
Added safetensors support.
Updated dependencies.
Replaced SoftMax with SDPA for default attention.
Removed use_linear_control_scale (linear_s_stage2) and use_linear_cfg_scale (linear_CFG) arguments.
- Uses the start and end scale values to determine whether linear scaling will be used/have effect or not.
Renamed arguments to make settings a bit more intuitive (more alignment with kijai's SUPIR ComfyUI custom nodes)
- spt_linear_CFG -> cfg_scale_start
- s_cfg -> cfg_scale_end
- spt_linear_s_stage2 -> control_scale_start
- s_stage2 -> control_scale_end
Added --skip_denoise_stage argument to bypass the artifact removal preprocessing step that uses the specialized VAE denoise encoder. This usually ends up with the image slightly softened (before sampling stage) since you do not want artifacts to be considered detail to be enhanced. You might want to skip this step if your image is already high quality.
Refactor: Renamed symbol upsacle in original code to upscale
Moved CLIP paths to a yaml config file.
Exposed sampler_tile_size and sampler_tile_stride to make them overridable when using TiledRestoreEDMSampler
SUPIR Settings saved into PNGInfo metadata
Parallel processing for Tiled VAE encoding/decoding
Improved memory management. On each run, it clears unused GPU (VRAM), cleans up Python's leftover crap, and releases unused RAM back to the system (Linux only).

Processing Times (seconds) with Models Preloaded
VRAM Usage : ~12GB
Note: Performance will vary depending on system specs beyond the GPU (CPU speed, memory bandwidth, etc.), so treat this only as a rough guide.

GPU Model	1024×1024	2048×2048	3072×3072
H100	15 s	95 s	243 s
RTX Pro 6000	10 s	71 s	190 s
RTX 5090	14 s	97 s	254 s
RTX 4090	18 s	133 s	329 s
RTX 3090	26 s	206 s	560 s

I’ve found max upscale between 2048×2048 and 4096x4096 to be the sweet spot for refinement work. 4096x4096 can yield smoother results (depending on the image), but it will be a LOT slower.

When working with large but imperfect images (for example, 45MP+ negative scans that are old or grainy), I split them into 2048×2048 tiles. This lets me refine each section independently while still preserving fine detail. The trade-off is that each tile requires its own prompt, along with some careful blending in Photoshop. Using overlaps between tiles makes this process easier. While it adds extra manual work, the payoff is much greater control. You can adjust prompts to suit the unique details of each region, whether that is faces, textures, text, or backgrounds, instead of relying on a single global prompt that may not work well for the entire image.

Best for you to experiment, if you have the patience.

Installation

Prerequisites:

Python 3.12
Git

Clone repo

git clone https://github.com/yushan777/SUPIR-Demo.git
cd SUPIR-Demo

# For Linux only
chmod +x *.sh

Install Environment

# Linux
./install_linux_local.sh

# Linux (Vast.ai)
./install_vastai.sh

# Windows
install_win_local.bat

Download Models

You can download the models at the same time while the venv is being installed (in a separate terminal)

# Linux
./download_models.sh

# Windows
download_models.bat

Manually Downloading The Models

ℹ️ See more information

If you prefer to Download the models manually or in your own time below are the links.
Additionally, if you already have these models then you can simply symlink them to the locations to save on storage space.

SmolVLM-500M-Instruct

For captioning input image in the Gradio demo.

SmolVLM-500M-Instruct Place all files into models/SmolVLM-500M-Instruct

SUPIR Models

Unless you have more than 24GB of VRAM, you should download the FP16 variants FP16 Versions

SUPIR-v0Q (FP16)
SUPIR-v0F (FP16)
Download and place the model files in the models/SUPIR/ directory.

FP32 Versions

SUPIR-v0Q (FP32)
SUPIR-v0F (FP32)
Download and place the model files in the models/SUPIR/ directory.

CLIP Models

CLIP Encoder-1
Place in models/CLIP1
CLIP Encoder-2
Place in models/CLIP2

SDXL Model

Juggernaut-XL_v9_RunDiffusionPhoto_v2
Place in models/SDXL
You can use your own preferred SDXL Model. One that specialises in realism, photographic will work better.

There are two SUPIR model variants: v0Q and v0F.

SUPIR-v0Q The v0Q model (Quality) is trained on a wide range of degradations, making it robust and effective across varied real-world scenarios. However, this broad generalization comes at a cost—when applied to images with only mild degradation, v0Q might overcompensate, hallucinate or alter details that are already mostly intact. This behavior stems from its training bias toward assuming significant visual damage.
SUPIR-v0F In contrast, the v0F model (Fidelity) is specifically trained on lighter degradation patterns. Its Stage1 encoder is tuned to better preserve fine details and structure, resulting in restorations that are more faithful to the input when the degradation is minimal. As a result, v0F is the preferred choice for high-fidelity restoration where subtle preservation is more critical than aggressive enhancement.

If necessary, edit Custom Path for Checkpoints. Otherwise leave these alone.

* [options/SUPIR_v0.yaml] --> SDXL_CKPT, SUPIR_CKPT_Q, SUPIR_CKPT_F. 
* [options/SUPIR_v0_tiled.yaml] --> SDXL_CKPT, SUPIR_CKPT_Q, SUPIR_CKPT_F.

Gradio Demo

# Linux
source venv/bin/activate
python3 run_supir_gradio.py

# or you can start it with the bash script (contains the above two commands)
chmod +x launch_gradio.sh
./launch_gradio.sh

# =======================================
# Windows
venv\Scripts\activate.bat
python run_supir_gradio.py

Default Settings

Default Settings can be set in the file defaults.json. If it doesn't exist, just copy and rename defaults_example.json

CLI Demo

# for cli test
python3 run_supir.py --img_path 'input/bottle.png' --save_dir ./output --SUPIR_sign Q --upscale 2 --use_tile_vae --loading_half_params

python3 run_supir.py \
--img_path 'input/woman-low-res-sq.jpg' \
--save_dir ./output \
--SUPIR_sign Q \
--upscale 2 \
--seed 1234567891 \
--img_caption 'A woman has dark brown eyes, dark curly hair wearing a dark scarf on her head. She is wearing a black shirt on with a pattern on it. The wall behind her is brown and green.' \
--edm_steps=50 \
--s_churn=5 \
--cfg_scale_start=2.0 \
--cfg_scale_end=4.0 \
--control_scale_start=0.9 \
--control_scale_end=0.9 \
--loading_half_params \
--use_tile_vae

Tested on Linux Mint, WSL, and Windows 11. It seems to run faster under Linux.

Processing Times / Memory Usage

Sampler: TiledRestoreEDMSampler
Tiled VAE: True
Number of Workers: 1
Linux, 64GB RAM

Upscale	4090 Time	4090 VRAM	4080 Time	4080 VRAM	4070 Time	4070 VRAM
2x	111 secs	14.0GB	227 secs	13.7GB	301 secs	11.7GB
3x	315 secs	14.1GB	475 secs	13.8GB	652 secs	11.7GB
4x	606 secs	14.6GB	910 secs	13.9GB	1625 secs	11.7GB
5x	992 secs	15.0GB	1492 secs	14.6GB	OOM	OOM

Arguments

Argument	Description
`img_path`	Path to the input image. (required)
`save_dir`	Directory to save the output.
`SUPIR_sign`	Model type. Options: `['F', 'Q']` Default: `'Q'` Q model (Quality) Trained on diverse, heavy degradations, making it robust for real-world damage. However, it may overcorrect or hallucinate when used on lightly degraded images due to its bias toward severe restoration. F model (Fidelity) Optimized for mild degradations, preserving fine details and structure. Ideal for high-fidelity tasks where subtle restoration is preferred over aggressive enhancement.
`skip_denoise_stage`	Skips the VAE Denoiser Stage. Default: `'False'` Bypass the artifact removal preprocessing step that uses the specialized VAE denoise encoder. This usually ends up with the image slightly softened (if you inspected it at this stage). This is to avoid SUPIR treating low-res/compression artifacts as detail to be enhanced. You may wish to skip this step if: - 1) You want do do your own pre-processing OR - 2) Input image is clean and free of low-res/compression artifacts or other degradations - Can sometimes make closeups of skin textures a bit unnatural.
`sampler_mode`	Sampler choice. Options: `['TiledRestoreEDMSampler', 'RestoreEDMSampler']` Default: `'TiledRestoreEDMSampler' (uses less VRAM)`
`seed`	Random seed for reproducibility. Default: `1234`
`Use Upscale to..`	If on, use `Update to width` and `Update to height` values for upscaling. If off, then `Upscale by` factor will be used.
`Upscale to width`	Upscale input image width to specified dimension if `Use Upscale to..` is on. Minimum: 1024
`Upscale to height`	Upscale input image height to specified dimension if `Use Upscale to..` is on. Minimum: 1024
`Upscale by`	Upscale factor for the input image. Default: `2` Upscaling of the input image is performed before the denoising and sampling stage. Both dimensions are multiplied by the upscale value. If the smaller of the dimensions is still < 1024px, the image is further enlarged to minimum of 1024px (aspect ratio maintained).
`***`	Notes about Upscaling: The reason for the minimum of 1024 is to give SDXL a comfortable working resolution. Note that dimensions are snapped to the nearest multiple of 64. The sweet spot seems to be between 2x and 4x (1024x1024) or 4x and 8x (512x512). Beyond that, the quality begins to collapse. The higher the scale factor, the slower the process.
`min_size`	Minimum output resolution. Default: `1024`
`num_samples`	Number of images to generate per input. Default: `1`
`img_caption`	Specific caption for the input image. Default: `''` This caption is combined with `a_prompt`.
`a_prompt`	Additional positive prompt (appended to input caption). Default: `Cinematic, High Contrast, highly detailed, taken using a Canon EOS R camera, hyper detailed photo - realistic maximum detail, 32k, Color Grading, ultra HD, extreme meticulous detailing, skin pore detailing, hyper sharpness, perfect without deformations.`
`n_prompt`	Negative prompt. Default: `painting, oil painting, illustration, drawing, art, sketch, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth`
`edm_steps`	Number of diffusion steps. Default: `50`
`s_churn`	controls how much extra randomness is added during the process. This helps the model explore more options and avoid getting stuck on a limited result. Default: `5` `0`: No noise (deterministic) `1–5`: Mild/moderate `6–10+`: Strong
`s_noise`	Scales s_churn noise strength. Default: `1.003` Slightly < 1: More stable Slightly > 1: More variation
`cfg_scale_start`	Prompt guidance strength start. Default: `2.0`
`cfg_scale_end`	Prompt guidance strength end. Default: `4` `1.0`: Weak (ignores prompt) `7.5`: Strong (follows prompt closely) If `cfg_scale_start` and `cfg_scale_end` have the same value, no scaling occurs. When these values differ, linear scheduling is applied from start to end. They can also be reversed for creative strategies.
`CFG Sweep`	Enables a mode to test a range of CFG scale values. When checked, it will generate multiple images, each with a different CFG scale, starting from `CFG Scale Start` to `CFG Scale End`. The seed is fixed during the sweep to ensure comparability between images.
`CFG Sweep Step`	The increment used to step from the start to the end CFG scale value during a sweep.
`CFG Sweep Direction`	Defines how the start and end value pairs are varied during a sweep. - Forward: The start value increases while the end value stays fixed. Example: 2/8 → 3/8 → 4/8 → 5/8 ... - Backward: The end value decreases while the start value stays fixed. Example: 2/8 → 2/7 → 2/6 → 2/5 ...
`Control Guidance Scale`	Guides how strongly the overall structure of the input image is preserved. The process moves from a start scale (at the beginning, with high noise) to an end scale (at the end, with low noise). - Control Scale Start: Structural guidance strength at the beginning of the process. Lower values allow more creative freedom early on. - Control Scale End: Structural guidance strength at the end of the process. Higher values ensure the final details conform closely to the original image. - Example: start=0.0 / end=1.0 begins with high creativity (ignoring the original structure) and ends by strictly adhering to the original image's structure for the final result.
`control_scale_start`	Structural guidance from input image, start strength. Default: `0.9`
`control_scale_end`	Structural guidance from input image, end strength. Default: `0.9` `0.0`: Disabled `0.1–0.5`: Light `0.6–1.0`: Balanced (default) `1.1–1.5+`: Very strong Same value = fixed. Different values = scheduled.
`restoration_scale`	Early-stage restoration strength. Controls how strongly the model pulls the structure of the output image back toward the original image. Only applies during the early stages of sampling when the noise level is high. Default: `0` (disabled).
`color_fix_type`	Color adjustment method. Default: `'Wavelet'` Options: `['None', 'AdaIn', 'Wavelet']`
`loading_half_params`	Loads the SUPIR model weights in half precision (FP16). Default: `False` Reduces VRAM usage and increases speed at the cost of slight precision loss.
`diff_dtype`	Precision to use for the diffusion model only. Allows overriding default precision independently, unless `loading_half_params` is set. Default: `'fp16'` Options: `['fp32', 'fp16', 'bf16']`
`ae_dtype`	Autoencoder precision. Default: `'bf16'` Options: `['fp32', 'bf16']`
`use_tile_vae`	Enables tile-based encoding/decoding for memory efficiency with large images. Default: `False`
`encoder_tile_size`	Tile size when encoding (when `use_tile_vae` is enabled). TileVAE code has recommended tile sizes based on available VRAM if a CUDA device is available. Encoder: - VRAM > 16GB: 3072 - VRAM > 12GB: 2048 - VRAM > 8GB: 1536 - VRAM <= 8GB: 960 - No GPU: 512
`decoder_tile_size`	Tile size when encoding (when `use_tile_vae` is enabled). TileVAE code has recommended tile sizes based on available VRAM if a CUDA device is available. Decoder: - VRAM > 30GB: 256 - VRAM > 16GB: 192 - VRAM > 12GB: 128 - VRAM > 8GB: 96 - VRAM <= 8GB: 64 - No GPU: 64
`Number of Workers`	Number of parallel CPU processes for VAE encoding/decoding. Improves speed on multi-core CPUs by efficiently preparing data for the GPU. Default: `4`
`sampler_tile_size`	Tile size for `TiledRestoreEDMSampler`. This is the size of each tile that the image is divided into during tiled sampling. Example: `tile_size` of 128 → image is split into 128×128 pixel tiles.
`sampler_tile_stride`	Tile stride for `TiledRestoreEDMSampler`. Controls overlap between tiles during sampling. Smaller `tile_stride` = more overlap, better blending, more compute. Larger `tile_stride` = less overlap, faster, may cause seams. `Overlap = tile_size - tile_stride` Examples: - tile_size = 128, stride = 64 → 64 px overlap.

Images from Pixabay
Original SUPIR Repository
Kijai's SUPIR Custom Nodes for ComfyUI

Name		Name	Last commit message	Last commit date
Latest commit History 301 Commits
SUPIR		SUPIR
Y7		Y7
assets		assets
configs/clip1		configs/clip1
input		input
models		models
options		options
sgm		sgm
.gitignore		.gitignore
=5.33.0		=5.33.0
LICENSE		LICENSE
README.md		README.md
TODO.txt		TODO.txt
_launch_cli.sh		_launch_cli.sh
_launch_gradio_vastai.sh		_launch_gradio_vastai.sh
cancellation.py		cancellation.py
defaults_example.json		defaults_example.json
download_models.bat		download_models.bat
download_models.sh		download_models.sh
install_linux_local.sh		install_linux_local.sh
install_vastai.sh		install_vastai.sh
install_win_local.bat		install_win_local.bat
launch_gradio.sh		launch_gradio.sh
memory-mon.sh		memory-mon.sh
notification.mp3		notification.mp3
paper.pdf		paper.pdf
requirements.txt		requirements.txt
requirements_win.txt		requirements_win.txt
run_supir_cli.py		run_supir_cli.py
run_supir_gradio.py		run_supir_gradio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Customized Version of the Original SUPIR Project

Installation

Prerequisites:

Clone repo

Install Environment

Download Models

Manually Downloading The Models

SmolVLM-500M-Instruct

SUPIR Models

CLIP Models

SDXL Model

Gradio Demo

Default Settings

CLI Demo

Tested on Linux Mint, WSL, and Windows 11. It seems to run faster under Linux.

Processing Times / Memory Usage

Arguments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

yushan777/SUPIR-Demo

Folders and files

Latest commit

History

Repository files navigation

A Customized Version of the Original SUPIR Project

Installation

Prerequisites:

Clone repo

Install Environment

Download Models

Manually Downloading The Models

SmolVLM-500M-Instruct

SUPIR Models

CLIP Models

SDXL Model

Gradio Demo

Default Settings

CLI Demo

Tested on Linux Mint, WSL, and Windows 11. It seems to run faster under Linux.

Processing Times / Memory Usage

Arguments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages