Researchers! Please develope joint video deblurring and frame interpolation models, use the best method for dealing with time-to-location ambiguity between two input frames, which the BiM-VFI currently has and train at least one of your models on Style loss, also called Gram matrix loss (the best perceptual loss function):
Source: FILM - Loss Functions Ablation https://film-net.github.io/
Each ranking includes only the best model for one method.
The rankings exclude all event-based and spike-guided models.
- 👑 RBI with real motion blur✔️: LPIPS😍 (no data)
This will be the King of all rankings. We look forward to ambitious researchers. - RBI with real motion blur✔️: PSNR😞>=28.5dB
- (to do)
- X-TEST (×8): LPIPS😍<=0.098
- SNU-FILM-arb Extreme (×16): LPIPS😍<=0.095
- SNU-FILM-arb Hard (×8): LPIPS😍<=0.048
- SNU-FILM-arb Medium (×4): LPIPS😍<=0.026
- SNU-FILM Extreme (×2): LPIPS😍<=0.1099
- SNU-FILM Hard (×2): LPIPS😍<=0.052
- SNU-FILM Medium (×2): LPIPS😍<=0.024
- Vimeo-90K triplet: LPIPS😍<=0.018
- Vimeo-90K triplet: LPIPS😍(SqueezeNet)<=0.014
- Vimeo-90K triplet: PSNR😞>=36dB
- Vimeo-90K septuplet: PSNR😞>=36dB
- Appendix 1: Runtime
- Appendix 2: Rules for qualifying models for the rankings (to do)
- Appendix 3: Metrics selection for the rankings
- Appendix 4: List of all research papers from the above rankings
📝 Note: Pre-BiT++ is pre-trained on Adobe240 and then fine-tuned on RBI.
RK | Model Links: Venue Repository |
PSNR ↑ {Input fr.} Table 1&6 BiT |
---|---|---|
1 | Pre-BiT++ |
31.32 {3} |
2 | DeMFI-Netrb(5,3) |
29.03 {4} |
3 | PRF4 -Large |
28.55 {5} |
📝 Note: This ranking has the most up-to-date layout.
📝 Note: This ranking has the most up-to-date layout.
📝 Note: This ranking has the most up-to-date layout.
RK | Model Links: Venue Repository |
LPIPS ↓ {Input fr.} Table 7 GIMM-VFI |
LPIPS ↓ {Input fr.} Table 1 BiM-VFI |
---|---|---|---|
1 | GIMM-VFI-F-P |
0.030 {2} | - |
2 | BiM-VFI |
- | 0.039 {2} |
3 | IA-Clearer [D,R]u IFRNet |
- | 0.048 {2} |
📝 Note: This ranking has the most up-to-date layout.
RK | Model Links: Venue Repository |
LPIPS ↓ {Input fr.} Table 7 GIMM-VFI |
LPIPS ↓ {Input fr.} Table 1 BiM-VFI |
---|---|---|---|
1 | GIMM-VFI-R-P |
0.016 {2} | - |
2 | BiM-VFI |
- | 0.023 {2} |
3 | IA-Clearer [D,R]u IFRNet |
- | 0.026 {2} |
📝 Note: This ranking has the most up-to-date layout.
📝 Note: This ranking has the most up-to-date layout.
📝 Note: This ranking has the most up-to-date layout.
RK | Model | LPIPS ↓ {Input fr.} |
Training dataset |
Official repository |
Practical model |
VapourSynth |
---|---|---|---|---|---|---|
1 | EAFI-𝓛ecp |
0.012 {2} |
Vimeo-90K triplet | - | EAFI-𝓛ecp | - |
2 | UGFI 𝓛S |
0.0126 {2} |
Vimeo-90K triplet | - | UGFI 𝓛S | - |
3 | SoftSplat - 𝓛F |
0.013 {2} |
Vimeo-90K triplet | SoftSplat - 𝓛F | - | |
4 | FILM-𝓛S |
0.0132 {2} |
Vimeo-90K triplet | FILM-𝓛S | - | |
5 | MoMo |
0.0136 {2} |
Vimeo-90K triplet | MoMo | - | |
6 | EDSC_s-𝓛F |
0.016 {2} |
Vimeo-90K triplet | EDSC_s-𝓛F | - | |
7 | CtxSyn - 𝓛F |
0.017 {2} |
proprietary | - | CtxSyn - 𝓛F | - |
8 | PerVFI |
0.018 {2} |
Vimeo-90K triplet | PerVFI | - |
RK | Model | LPIPS ↓ | Originally announced |
Official repository |
Practical model |
VapourSynth |
---|---|---|---|---|---|---|
1 | CDFI w/ adaP/U | 0.008 1 | March 2021 2 | - | - | |
2 | EDSC_s-𝓛F | 0.010 2 | June 2020 3 | EDSC_s-𝓛F | - | |
3 | DRVI | 0.013 4 | August 2021 4 | - | - | - |
RK | Model | PSNR ↑ {Input fr.} |
Originally announced or Training dataset |
Official repository |
Practical model |
VapourSynth |
---|---|---|---|---|---|---|
1 | MA-GCSPA-triplets |
36.85 {2} |
Vimeo-90K triplet | - | - | |
2 | VFIformer + HRFFM ENH: |
36.69 {2} |
Vimeo-90K triplet | ENH: - |
- | - |
3 | LADDER-L |
36.65 {2} |
Vimeo-90K triplet | - | - | - |
4-5 | EMA-VFI | 36.64dB 5 | March 2023 5 | - | - | |
4-5 | VFIMamba |
36.64 {2} |
Vimeo-90K triplet & X-TRAIN | - | - | |
6 | IQ-VFI |
36.60 {2} |
Vimeo-90K triplet | - | - | - |
7 | DQBC-Aug | 36.57dB 6 | April 2023 6 | - | - | |
8 | TTVFI | 36.54dB 7 | July 2022 7 | - | - | |
9 | AMT-G | 36.53dB 8 | April 2023 8 | - | - | |
10 | AdaFNIO | 36.50dB 9 | November 2022 9 | - | - | |
11 | FGDCN-L | 36.46dB 10 | November 2022 10 | - | - | |
12 | VFIFT |
36.43 {2} |
Vimeo-90K triplet | - | - | - |
13 | UPR-Net LARGE | 36.42dB 11 | November 2022 11 | - | - | |
14 | EAFI-𝓛ecc | 36.38dB 12 | July 2022 12 | - | EAFI-𝓛ecp | - |
15 | H-VFI-Large | 36.37dB 13 | November 2022 13 | - | - | - |
16 | UGFI 𝓛1 |
36.34 {2} |
Vimeo-90K triplet | - | UGFI 𝓛S | - |
17 | VFIT-B |
36.33 {2} |
? | - | - | |
18 | SoftSplat - 𝓛Lap with ensemble | 36.28dB 14 | March 2020 15 | SoftSplat - 𝓛F | - | |
19 | ProBoost-Net (448x256) |
36.23 {2} |
? | - | - | - |
20 | NCM-Large | 36.22dB 16 | July 2022 16 | - | - | - |
21-22 | IFRNet large | 36.20dB 17 | May 2022 17 | - | - | |
21-22 | RAFT-M2M++ ENH: |
36.20 {2} |
Vimeo-90K triplet | - | - | |
23-24 | EBME-H* | 36.19dB 18 | June 2022 18 | - | - | |
23-24 | RIFE-Large |
36.19 {2} |
Vimeo-90K triplet | Practical-RIFE 4.25 | ||
25 | ABME | 36.18dB 19 | August 2021 19 | - | - | |
26 | HiFI |
36.12 {2} |
Pretraining: Raw videos Training: Vimeo-90K triplet & X-TRAIN |
- | - | - |
27 | TDPNetnv w/o MRTM |
36.069 {2} |
Vimeo-90K triplet | - | TDPNet | - |
28 | FILM-𝓛1 |
36.06 {2} |
Vimeo-90K triplet | FILM-𝓛S | - |
RK | Model | PSNR ↑ {Input fr.} |
Originally announced or Training dataset |
Official repository |
Practical model |
VapourSynth |
---|---|---|---|---|---|---|
1 | Swin-VFI |
38.04 {6} |
Vimeo-90K septuplet | - | - | - |
2 | JNMR | 37.19dB 20 | June 2022 20 | - | - | |
3 | VFIT-B |
36.96 {4} |
Vimeo-90K septuplet | - | - | |
4 | VRT |
36.53 {4} |
Vimeo-90K septuplet | - | - | |
5 | ST-MFNet | 36.507dB 21 | November 2021 22 | - | - | |
6 | EDENVFI PVT(15,15) | 36.387dB 21 | July 2023 21 | - | - | - |
7 | IFRNet |
36.37 {2} |
Vimeo-90K septuplet | - | - | |
8 | RN-VFI |
36.33 {4} |
Vimeo-90K septuplet | - | - | - |
9 | FLAVR |
36.3 {4} |
Vimeo-90K septuplet | - | - | |
10 | DBVI | 36.17dB 23 | October 2022 23 | - | - | |
11 | EDC | 36.14dB 20 | February 2022 24 | - | - |
Model Links: Venue Repository |
Runtime(s) (×2) A100 1280×768 Table 7 BiM-VFI |
---|---|
BiM-VFI |
0.151 |
EMA-VFI |
0.104 |
GIMM-VFI-R |
0.494 |
UPR-Net LARGE |
0.053 |
Currently, the most commonly used metrics in the existing works on video frame interpolation and video deblurring are: PSNR, SSIM and LPIPS. Exactly in that order.
The main purpose of creating my rankings is to look for the best perceptually-oriented model for practical applications - hence the primary metric in my rankings will be the most common perceptual image quality metric in scientific papers: LPIPS.
At the time of writing these words, in October 2023, in relation to VFI, I have only found another perceptual image quality metric - DISTS in one paper: and also in one paper I found a bespoke VFI metric - FloLPIPS [arXiv]. Unfortunately, both of these papers omit to evaluate the best performing models based on the LPIPS metric. If, in the future, some researcher will evaluate LPIPS top-performing models using alternative, better perceptual metrics, I would of course be happy to add rankings based on those metrics.
I would like to use only one metric - LPIPS. Unfortunately still many of the best VFI and video deblurring methods are only evaluated using PSNR or PSNR and SSIM. For this reason, I will additionally present rankings based on PSNR, which will show the models that can, after perceptually-oriented training, be the best for practical applications, as well as providing a source of knowledge for building even better practical models in the future.
I have decided to completely abandon rankings based on the SSIM metric. Below are the main reasons for this decision, ranked from the most important to the less important.
-
The main reason is the following quote, which I found in a paper by researchers at Adobe Research: 14. In the quote they refer to a paper by researchers at NVIDIA: [arXiv].
We limit the evaluation herein to the PSNR metric since SSIM [57] is subject to unexpected and unintuitive results [39].
-
The second reason is, more and more papers are appearing where PSNR scores are given, but without SSIM: 21 and
A model from such a paper appearing only in the PSNR-based ranking and at the same time not appearing in the SSIM-based ranking may give the misleading impression that the SSIM score is so poor that it does not exceed the ranking eligibility threshold, while there is simply no SSIM score in a paper.
-
The third reason is, that often the SSIM scores of individual models are very close to each other or identical. This is the case in the SNU-FILM Easy test, as shown in Table 3: [CVPR 2023], where as many as 6 models achieve the same score of 0.991 and as many as 5 models achieve the same score of 0.990. In the same test, PSNR makes it easier to determine the order of the ranking, with the same number of significant digits.
-
The fourth reason is that PSNR-based rankings are only ancillary when a model does not have an LPIPS score. For this reason, SSIM rankings do not add value to my repository and only reduce its readability.
-
The fifth reason is that I want to encourage researchers who want to use only two metrics in their paper to use LPIPS and PSNR instead of PSNR and SSIM.
-
The sixth reason is that the time saved by dropping the SSIM-based rankings will allow me to add new rankings based on other test data, which will be more useful and valuable.
Footnotes
-
AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling [TIP 2022] [arXiv] ↩
-
CDFI: Compression-Driven Network Design for Frame Interpolation [CVPR 2021] [arXiv] ↩ ↩2
-
Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution [TPAMI 2021] [arXiv] ↩
-
DRVI: Dual Refinement for Video Interpolation [Access 2021] ↩ ↩2
-
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation [CVPR 2023] [arXiv] ↩ ↩2
-
Video Frame Interpolation with Densely Queried Bilateral Correlation [IJCAI 2023] [arXiv] ↩ ↩2
-
TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation [TIP 2023] [arXiv] ↩ ↩2
-
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation [CVPR 2023] [arXiv] ↩ ↩2
-
AdaFNIO: Adaptive Fourier Neural Interpolation Operator for video frame interpolation [arXiv] ↩ ↩2
-
Flow Guidance Deformable Compensation Network for Video Frame Interpolation [TMM 2023] [arXiv] ↩ ↩2
-
A Unified Pyramid Recurrent Network for Video Frame Interpolation [CVPR 2023] [arXiv] ↩ ↩2
-
Error-Aware Spatial Ensembles for Video Frame Interpolation [arXiv] ↩ ↩2
-
H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions [arXiv] ↩ ↩2
-
Revisiting Adaptive Convolutions for Video Frame Interpolation [WACV 2021] [arXiv] ↩ ↩2
-
Softmax Splatting for Video Frame Interpolation [CVPR 2020] [arXiv] ↩
-
Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [MM 2022] [arXiv] ↩ ↩2
-
IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation [CVPR 2022] [arXiv] ↩ ↩2
-
Enhanced Bi-directional Motion Estimation for Video Frame Interpolation [WACV 2023] [arXiv] ↩ ↩2
-
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation [ICCV 2021] [arXiv] ↩ ↩2
-
JNMR: Joint Non-linear Motion Regression for Video Frame Interpolation [TIP 2023] [arXiv] ↩ ↩2 ↩3
-
Efficient Convolution and Transformer-Based Network for Video Frame Interpolation [ICIP 2023] [arXiv] ↩ ↩2 ↩3 ↩4
-
ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation [CVPR 2022] [arXiv] ↩
-
Deep Bayesian Video Frame Interpolation [ECCV 2022] ↩ ↩2
-
Enhancing Deformable Convolution based Video Frame Interpolation with Coarse-to-fine 3D CNN [ICIP 2022] [arXiv] ↩