Releases · hao-ai-lab/FastVideo

29 Aug 03:57

SolitaryThinker

v0.1.6

2dcc5ea

Release 0.1.6 Latest

Latest

What's Changed

[Chore] Include our demo in the readme. by @jzhang38 in #720
[Feat] Add Wan2.2 14B MoE by @JerryZhou54 in #688
[Misc] change installation logic of vsa by @jzhang38 in #721
[3/3][Preprocess] add preprocessing workflows by @Eigensystem in #645
[Fix] training pipeline pin_cpu_memory issue by @Eigensystem in #692
add cicd workflow for publishing VSA kernel by @Gary-ChenJL in #723
Remove all empty_cache by @Edenzzzz in #713
update version selection for VSA workflow by @Gary-ChenJL in #725
[bugfix] fix pyproject install and VSA precision test by @SolitaryThinker in #726
Fix LoRA load from training checkpoint by @Edenzzzz in #719
[bugfix] [distill] remove i2v validation schema import in distill by @SolitaryThinker in #728
[feature] add Gradio live serving demo code by @SolitaryThinker in #727
[bugfix] [dmd] Fix backward simulation and also naming in wan_i2v_dmd_pipeline by @SolitaryThinker in #731
Fix vsa backward gQ by @jzhang38 in #735
feat: preprocess validation dataset only when exist by @Eigensystem in #734
Update WeChat group link by @jzhang38 in #739
[Feat][Preprocessing] i2v preprocessing workflow by @Eigensystem in #737
[Docker] add 12.9 docker image and also fix py3.10 and py3.11 dockerfile by @SolitaryThinker in #749
[bugfix] Missing Docker file for cuda12.9 by @SolitaryThinker in #750
[bugfix] [dmd] Align backward simulation with dmd2 sample back by @nappengman in #744
[Fix] fix seed in dmd denoising loop by @jzhang38 in #736
[bugfix] Check that model_index.json module is in required_modules list before removing by @SolitaryThinker in #756
Optionally use unmerged weights for inference by @Edenzzzz in #745
[Feat][Preprocess] support merged dataset by @Eigensystem in #752
[Feat][Preprocess] support multi-gpus by @Eigensystem in #753
[misc] [docs] Various fixes for logging and docs by @SolitaryThinker in #758
[bugfix] Fix wrong HF model string for FastWan2.2 5B by @SolitaryThinker in #763
Update Community Link by @jzhang38 in #765
[Feat] Support Self-Forcing's Causal Inference for Wan2.1 T2V 1.3B by @JerryZhou54 in #766
[Feature] Add wan2.2 5b i2v by @JerryZhou54 in #760
[chore] Release 0.1.6 by @SolitaryThinker in #768

New Contributors

@Gary-ChenJL made their first contribution in #723
@nappengman made their first contribution in #744

Full Changelog: v0.1.5...v0.1.6

Contributors

SolitaryThinker, jzhang38, and 5 other contributors

Assets 2

06 Aug 20:09

SolitaryThinker

v0.1.5

0ceff11

Release 0.1.5

What's Changed

[CI] Add LoRA inference tests by @Edenzzzz in #546
[bugfix] [training] use separate generator for validation by @SolitaryThinker in #610
[chore] release 0.1.2 by @SolitaryThinker in #622
video gen working on apple silicon (addressed issues from prior pr) by @RandNMR73 in #595
[v0] Remove V0 code by @SolitaryThinker in #621
[LoRA] Support v1 LoRA training by @Edenzzzz in #576
Py/add triton block sparse by @jzhang38 in #593
Fix lora train steps by @Edenzzzz in #627
[bugfix] fa3 no longer returns lse by @SolitaryThinker in #631
[CI] Fix CI for pull request targets other than main by @kevin314 in #632
[bugfix] Fix preprocessing pipelines and nightly tests by @SolitaryThinker in #633
[Bugfix] Fix LoRA trainable params and training ckpt loading by @Edenzzzz in #630
[Docs] Docs update for Training and MPS by @SolitaryThinker in #641
[Feature] Add DMD inference pipeline by @BrianChen1129 in #637
[Feature] Multi-lora inference by @Edenzzzz in #640
[Feature] Remove V1 folder by @SolitaryThinker in #642
[1/3][Preprocess] refactor preprocessing configs by @Eigensystem in #638
[misc] Use FASTVIDEO_STAGE_LOGGING for perf timing of stage by @SolitaryThinker in #644
[Feature] Add prompt_txt support for CLI inference; Add DMD CLI inference by @BrianChen1129 in #646
[core] Add offloading for vae and image encoder and rename offloading args by @SolitaryThinker in #643
[CI] Add publish workflow for ComfyUI by @kevin314 in #647
[CI] Fix ComfyUI publisher ID by @kevin314 in #648
[bugfix] VideoGenerator improperly extracts output_video_name by @SolitaryThinker in #649
[Feature] Add DMD T2V training pipeline by @BrianChen1129 in #651
[Feature] Ignore [union-attr] and [override] mypy check and remove from training by @BrianChen1129 in #652
[Feature] Add Wan-14B-T2V-VSA CLI inference; add master port args by @BrianChen1129 in #653
[Feature][Distill]Add DMD+VSA joint training example by @BrianChen1129 in #654
[2/3][Preprocess] refactor pipeline registry & file structure by @Eigensystem in #639
[Feature][Distill]Add 14B 480p T2V distill example scripts by @BrianChen1129 in #655
[Feat] Support VSA with any resolution. by @jzhang38 in #650
[Bugfix]Fix DMD wan pipeline by @BrianChen1129 in #659
[Bugfix]Fix mdoel inference checkpoint saving when enabling HSDP by @BrianChen1129 in #660
[Feature] Add DMD CI test by @BrianChen1129 in #661
[Feature]Add DMD distillation training resume checkpoint; Update DMD CI test by @BrianChen1129 in #662
[Feature] Add wan2.2 5B T2V by @SolitaryThinker in #658
[ComfyUI] Add init.py for node discovery by @kevin314 in #663
[BUG] Fix distillation + vsa by @jzhang38 in #665
[Feature]Add VSA slurm training example scripts by @BrianChen1129 in #666
[chore] Release 0.1.4 by @kevin314 in #667
[Bugfix][Training]Fix Wan2.2 training vae config issue by @BrianChen1129 in #668
[Feature] [Inference]Add ROCm platform support for single-gpu inference by @sopiko99 in #669
[Bugfix]Fix DMD pipeline registry by @BrianChen1129 in #670
Modify args to make sure the scripts are runnable on 4090 by @JerryZhou54 in #671
[Misc] Update examples/ and other misc by @jzhang38 in #672
[Feature]Add DMD visualization for debugging by @BrianChen1129 in #674
fix _normalize_dit_input by @MartinPernus in #681
[Bugfix] Fix multi-gpu training lr_scheduler by @BrianChen1129 in #682
[Misc] Fix training scripts by @Edenzzzz in #683
[Bugfix] Add i2v vae loading by @BrianChen1129 in #686
[Feature] Optionally enable torch compile by @Edenzzzz in #684
[Feature[[Readme] Add VSA/DMD doc by @BrianChen1129 in https://github.com//pull/673
[Feature] Add Wan2.2-TI2V-5B Sparse Distill by @BrianChen1129 in #690
[config] Add config for FastWan2.2 ti2v 5B by @SolitaryThinker in #693
[Feature] Add Wan2.2 DMD example files; Update lr scheduler by @BrianChen1129 in #694
[Feature] Remove unused args by @BrianChen1129 in #695
[Docs] Update README and docs for FastWan by @SolitaryThinker in #698
[Feature] Update sparse distill readme and doc by @BrianChen1129 in #700
[misc] Readme fixes by @SolitaryThinker in #699
[Feature] Update readme by @BrianChen1129 in #702
[Docs] Fix README by @SolitaryThinker in #701
Update readme pre-release by @zhisbug in #704
[Feature] Update cites by @BrianChen1129 in #703
[Feature]Update Wan2.2+DMD doc example by @BrianChen1129 in #706
[misc] Remove allow_tf32 in scripts by @Edenzzzz in #705
Add WeChat group link by @jzhang38 in #707
[Bugfix] Fix neg_prompt bug when training from local cp by @BrianChen1129 in #708
Fix typo by @BrianChen1129 in #709
[Feature]Add Data-free distillation readme by @BrianChen1129 in #710
[chore] Release 0.1.5 by @SolitaryThinker in #717

New Contributors

@RandNMR73 made their first contribution in #595
@sopiko99 made their first contribution in #669
@MartinPernus made their first contribution in #681
@zhisbug made their first contribution in #704

Full Changelog: v0.1.2...v0.1.5

Contributors

zhisbug, SolitaryThinker, and 9 other contributors

Assets 2

15 Jul 19:26

SolitaryThinker

v0.1.2

7e5ebb4

v0.1.2

Last release before removal of v0 code.

What's Changed

Fix VAE precisions by @Edenzzzz in #588
[LoRA] Fix lora merge weights by @Edenzzzz in #579
Add ComfyUI custom node for inference by @kevin314 in #596
[Feature] Offload all text encoders by default by @Edenzzzz in #594
[Training] Use inference pipeline for training validation by @SolitaryThinker in #585
[chore] Upgrade min Python version from 3.8 to 3.10 by @SolitaryThinker in #597
[bugfix] [training] fix deadlock in latent datasets and init error in multi-node training by @SolitaryThinker in #598
[docs] Update slack invite by @SolitaryThinker in #601
[docs] update dev guide runpod image to py3.12 by @SolitaryThinker in #602
Remove all unnecessary torch.cuda.empty_cache by @Edenzzzz in #606
Set encoder TP size to 1 by default by @Edenzzzz in #569
[Feature][Training]Update example fine-tuning scripts to enable gradient checkpointing by @BrianChen1129 in #618

Full Changelog: v0.1.1...v0.1.2

Contributors

SolitaryThinker, kevin314, and 2 other contributors

Assets 2

01 Jul 06:44

SolitaryThinker

v0.1.1

3213317

v0.1.1

What's Changed

[Docs] Add CLI docs by @SolitaryThinker in #406
[Docs] Fix image by @SolitaryThinker in #407
[Teacache] allow None for forward_context batch when using teacache by @SolitaryThinker in #412
[V1] Remove vLLM dependency by @SolitaryThinker in #413
Fulfill worker response on interrupt by @kevin314 in #417
[bug] fix bs > 1 by @SolitaryThinker in #418
Fix version number by @Edenzzzz in #422
[Tests] don't run 3.10 and 3.11 for SSIM by @SolitaryThinker in #427
Use version.py by @Edenzzzz in #424
Unify env report script in issue template by @Edenzzzz in #423
Set device for encode by @kevin314 in #420
[Misc] Small fixes to Torch code by @applesaucethebun in #395
misc: Trigger transformers CI for layers and attention code change by @Edenzzzz in #434
[Training] [2/n] add bwd for all2all and all_gather by @SolitaryThinker in #439
[Training] [3/n] Add training args and dependencies by @SolitaryThinker in #440
[Training] [4/n] add training save checkpoint by @SolitaryThinker in #441
[Training] [1/n] Add latent datasets by @SolitaryThinker in #438
Update STA mask strategy downloading by @BrianChen1129 in #445
[Training] [5/n] Add single gpu training pipeline by @SolitaryThinker in #447
[Training] [0/n] Add preprocessing pipeline by @JerryZhou54 in #442
[Training] [6/n]Mixed precision training by @SolitaryThinker in #448
[Training] [7/n] gradient clipping by @SolitaryThinker in #449
[Training] [8/n] SP Training by @SolitaryThinker in #450
misc: add remote pdb for debugging workers by @Edenzzzz in #456
[Misc] Remove InferenceEngine by @Edenzzzz in #455
[Misc] disable cast_forward_inputs by @SolitaryThinker in #460
Bring back mask files under asset/ and update new Wan mask strategy file by @BrianChen1129 in #462
Fix WanVideo by @JerryZhou54 in #461
[Training] Add distributed checkpointing by @kevin314 in #458
Update v1 inference scripts by @JerryZhou54 in #467
[Training] Support Multi-Node training with FSDP + SP by @SolitaryThinker in #459
[misc] Polish V1 training code by @Edenzzzz in #469
[misc] Find unused port in distributed init by @Edenzzzz in #475
[LoRA] Support V1 LoRA inference by @Edenzzzz in #451
[bugfix] fix bz >1 for training by @SolitaryThinker in #477
[Issue template] Move env report to the end for readability by @Edenzzzz in #476
[Preprocess] I2V dataset by @BrianChen1129 in #473
[Distill] support distill for wan by @AliceChenyy in #444
[STA] Implement mask search and update mask strategy for V1's Wan2.1 by @KevinZeng08 in #415
[bugfix] [training] Add negative prompt to preprocessing and validation by @jzhang38 in #479
[bugfix] [misc] fix denoising stage init; rename distributed env function; fix logging. by @jzhang38 in #481
Add torch.compile for all small ops by @Edenzzzz in #432
Revert "Add torch.compile for all small ops" by @Edenzzzz in #484
[Bug] Fix multi gpus issues in v1 scripts by @BrianChen1129 in #489
[misc] Improve distributed related env variables and setup by @jzhang38 in #487
[bugfix][Cli Inference] Resolve runtime errors when running fastvideo generate by @JerryZhou54 in #493
Fix pre-commit CI by @Edenzzzz in #494
[bugfix][Cli Inference] Resolve runtime errors when running fastvideo generate by @JerryZhou54 in #495
[Feature] Adding VSA inference by @BrianChen1129 in #478
[misc] Add missing license headers by @SolitaryThinker in #499
[Feat][Dataloader] 1/n Refactor parquet map-style dataloader by @jzhang38 in #492
[Feature][VSA]Update STA publish workflow by @BrianChen1129 in #498
[misc] rename dp_size to hdsp_replicate_dim by @jzhang38 in #491
[CI] [Training] Initial e2e small training test by @SolitaryThinker in #504
[feat] Add parquet iterable dataset. by @jzhang38 in #506
[Refactor][Configurations] clean config orgnization by @Eigensystem in #505
fix logging by @jzhang38 in #509
[CI] Restrict training CI to v1 by @Edenzzzz in #508
[misc] Fix preprocessing and dataloader extra padding by @jzhang38 in #514
[Feature][Training]vsa for t2v training ready by @BrianChen1129 in #513
[Feature][Preprocess]Add Readme doc for preprocess by @BrianChen1129 in #518
[CI] [Training] drop negative prompt in validation dataset and CI test for preprocess + training overfit by @SolitaryThinker in #519
[Bugfix][Preprocess]fix mini dataset name by @BrianChen1129 in #520
[Refactor] Fix attn backend selection not correctly setting env variable by @Edenzzzz in #516
[misc] [ci] fix e2e preprocess+training data path by @SolitaryThinker in #521
[bugfix] [Training] use diffusers fp32layernorm for wan2.1 by @SolitaryThinker in #490
[CI] Update Docker image to flash-attn 2.8.0 / CUDA 12.8 by @kevin314 in #524
[CI] Add current PR test workflow to Buildkite/Modal by @kevin314 in #512
[CI][bugfix] Use new 3.12 docker image by @SolitaryThinker in #526
[Bugfix][Inference]Fix envs.attn_backend by @BrianChen1129 in #525
[Ci] add sta and vsa install to docker image by @SolitaryThinker in #528
[Feature][CI]Add STA-inference/VSA-training test by @BrianChen1129 in #527
[Bugfix][Readme]Fix readme website bugs and add VSA finetune docs by @BrianChen1129 in #531
[Refactor] Move dict_to_3d_list under utils by @Edenzzzz in #507
Specify cu128 Pytorch installation by @kevin314 in #530
[Feat] Add Stage input and output verification by @SolitaryThinker in #523
[misc] Remove gradient checking code by @SolitaryThinker in #532
[bugfix] Fix stage validator for multi text encoder models by @SolitaryThinker in #535
[bugfix] [VSA] Fix layernorm type for VSA Wan2.1 TransformerBlock by @SolitaryThinker in #534
[misc] [training] Reorganize training pipeline by @SolitaryThinker in #533
[chore] Bump torch to 2.7.1 to support Blackwell by @Edenzzzz in #483
[Training] Refactor and improve validation datasets by @SolitaryThinker in #539
[Feature][Training]Add diffusers format checkpoint saving for inference by @BrianChen1129 in #542
[Kernel] Remove all syncs from STA & VSA kernels by @Edenzzzz in #517
[CI] Fix CI checks by @Edenzzzz in #553
[Feature][Training] Add cfg rate for dataset loader by @BrianChen1129 in ht...

Contributors

SolitaryThinker, kevin314, and 9 other contributors

Assets 2

12 May 18:55

SolitaryThinker

v0.1.0

6eeb606

v0.1.0

What's Changed

[V1] Update README by @SolitaryThinker in #400
[CLI] Default to pipeline config by @kevin314 in #401
[V1] Docs Update by @SolitaryThinker in #402
[V1] Update where num_frame rounding is done by @SolitaryThinker in #403
Release 0.1.0 by @SolitaryThinker in #405

Full Changelog: v0.0.5...v0.1.0

Contributors

SolitaryThinker and kevin314

Assets 2

11 May 23:26

SolitaryThinker

v0.0.5

59ab481

v0.0.5 Pre-release

Pre-release

What's Changed

Syn main with yongqi-dev2 by @BrianChen1129 in #70
[cleanup] by @jzhang38 in #72
add web demo by @BrianChen1129 in #73
Cleanup by @jzhang38 in #75
Cleanup by @jzhang38 in #77
Hunyuanvideo by @jzhang38 in #78
add hunyuan adv by @jzhang38 in #79
update release readme by @foreverpiano in #81
Cleanup README. by @jzhang38 in #83
Clean up by @jzhang38 in #84
Rlsu lora readme by @jzhang38 in #85
Rlsu lora readme by @jzhang38 in #86
Add Replicate demo and API by @lucataco in #93
fix lora checkpoint saving issue by @BrianChen1129 in #97
[feat]:Single 4090 inference for fasthunyuan by @jzhang38 in #104
[Minor] Adding issue template. by @foreverpiano in #114
[feat]: Add format auto fixer to main branch by @rlsu9 in #124
[Fix] Save CK, Dataset bug fix by @jzhang38 in #125
[feat]: Add tests for FastVideo by @rlsu9 in #127
add parallel for vae decoding by @rucnyz in #134
Update README.md by @foreverpiano in #131
adding hunyuan hf (support lora finetuning); unified hunyuan hf inference with quantization by @BrianChen1129 in #135
Lora README update by @BrianChen1129 in #155
Create config.yml by @foreverpiano in #152
add sliding tile attn by @jzhang38 in #182
Add STA and teacache forward by @BrianChen1129 in #184
fix kernel issue by @BrianChen1129 in #185
Infer sta tea with torch.compile by @BrianChen1129 in #190
[feat]: fix readme demo and add video to readme by @rlsu9 in #191
update env by @jzhang38 in #194
Update Cite by @jzhang38 in #195
Update typo by @jzhang38 in #198
fix ori hunyuan inference issue by @BrianChen1129 in #199
Add StepVideo by @jzhang38 in #200
[feat] fix isort format by @rlsu9 in #203
[FIX] Make STA optinal by @jzhang38 in #204
Update readme by @BrianChen1129 in #202
Update STA README.md by @jzhang38 in #206
Added multi-GPU support for Hunyuan STA by @BrianChen1129 in #211
fix train/distill issue by @BrianChen1129 in #215
update cfg bug? by @jzhang38 in #223
fix training mask strategy issue by @BrianChen1129 in #248
Establish cicd workflow to build and publish FastVideo and STA Kernel by @PorridgeSwim in #227
v1 by @jzhang38 in #270
Set up text encoder tests to work with pytest and Github Actions by @kevin314 in #302
[CI] Add test workflow improvements by @kevin314 in #311
refactor the env setup and install of fastvideo by @PorridgeSwim in #309
Add ssim test by @kevin314 in #314
Fix sdpa by @jzhang38 in #315
Add torch sdpa backend to ssim test by @SolitaryThinker in #316
[CI] Add manual triggers for PR workflow by @kevin314 in #320
[Docs] Initial Docs Build by @SolitaryThinker in #322
[CI] Use pre-commit to run linter by @SolitaryThinker in #321
[Docs] Fix doc lint by @SolitaryThinker in #325
[CI] Set allowedCudaVersions by @kevin314 in #329
[Docs] Add dev guide and doc building CI by @SolitaryThinker in #330
Port tests to v1 by @kevin314 in #333
V1 wan rebased by @SolitaryThinker in #335
[Model] Remove RMSNorm's forward_native hardcode from Wan by @SolitaryThinker in #339
[Docs] Initial examples setup and more docs by @SolitaryThinker in #332
[CI] Support custom Docker image by @kevin314 in #342
Add STA to V1 by @jzhang38 in #312
[STA] Sta release 0.0.3 by @SolitaryThinker in #344
[CI] Free up runner disk for sta-publish by @kevin314 in #345
[CI] Add manual trigger to sta-publish and fastvideo-publish by @kevin314 in #346
Pipeline config by @JerryZhou54 in #343
[CI] Docker image improvements by @kevin314 in #350
Default to using original WanVAE's encoding/decoding algorithm by @JerryZhou54 in #351
[CLI] Fix duplicate --num-gpus by @kevin314 in #352
add STA to Wan v1 by @BrianChen1129 in #349
[Docs] Fix developer guide images by @kevin314 in #353
[1/n] [v1] Add Worker abstractions for User API by @SolitaryThinker in #336
[sta] release 0.0.4 by @SolitaryThinker in #354
[V1] Worker cleanup; Logging clean up; enables isort again by @SolitaryThinker in #355
[V1] Process aware logging; improve logging msg by @SolitaryThinker in #356
[V1] Gradio demo with new API by @kevin314 in #357
chore: Release FastVideo 0.0.2 and update python requirements by @SolitaryThinker in #360
[V1] Worker improvements/cleanup by @kevin314 in #361
[Docs] Docs for design and adding new pipeline by @SolitaryThinker in #363
[Attn] Add SageAttention Backend by @SolitaryThinker in #366
Model config by @JerryZhou54 in #358
Update SSIM tests to use new API by @kevin314 in #369
Refactor encoder by @JerryZhou54 in #370
Fix model config for python 3.11+ by @SolitaryThinker in #373
release 0.0.3 by @SolitaryThinker in #374
change gradio example to use model configs by @SolitaryThinker in #375
Fix FSDP issues when using cpu_offload flag by @JerryZhou54 in #376
[CI] Add new images for different Python versions by @kevin314 in #377
[CI] Add write permissions to build-image workflow by @kevin314 in #379
[Lint] fix by @SolitaryThinker in #382
Add Teacache to V1 by @SolitaryThinker in #371
Cleanup Teacache params by @SolitaryThinker in #386
Small Fixes & Features by @JerryZhou54 in #378
[Docs] Update for V1 by @SolitaryThinker in #381
[misc] Improve worker cleanup by @SolitaryThinker in #387
Release 0.0.4 by @SolitaryThinker in #388
[Misc] Small Fixes & Features by @JerryZhou54 in #390
[Docs] Add collect_env.py and various docs update by @SolitaryThinker in #393
[CI] Use python 3.10/3.11 for SSIM test by @kevin314 in #392
[CLI] Update cli to support new api/mo...