Skip to content

Commit 17d1774

Browse files
committed
Code release
0 parents  commit 17d1774

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+9145
-0
lines changed

LICENSE

Lines changed: 674 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
# Robust Video Matting (RVM)
2+
3+
![Teaser](/documentation/image/teaser.gif)
4+
5+
<p align="center">English | <a href="README_zh_Hans.md">中文</a></p>
6+
7+
Official repository for the paper [Robust High-Resolution Video Matting with Temporal Guidance](https://peterl1n.github.io/RobustVideoMatting/). RVM is specifically designed for robust human video matting. Unlike existing neural models that process frames as independent images, RVM uses a recurrent neural network to process videos with temporal memory. RVM can perform matting in real-time on any videos without additional inputs. It achieves **4K 76FPS** and **HD 104FPS** on an Nvidia GTX 1080 Ti GPU. The project was developed at [ByteDance Inc.](https://www.bytedance.com/)
8+
9+
<br>
10+
11+
## News
12+
13+
* [Aug 25 2021] Source code and pretrained models are published.
14+
* [Jul 27 2021] Paper is accepted by WACV 2022.
15+
16+
<br>
17+
18+
## Showreel
19+
Watch the showreel video ([YouTube](https://youtu.be/Jvzltozpbpk), [Bilibili](https://www.bilibili.com/video/BV1Z3411B7g7/)) to see the model's performance.
20+
21+
<p align="center">
22+
<a href="https://youtu.be/Jvzltozpbpk">
23+
<img src="documentation/image/showreel.gif">
24+
</a>
25+
</p>
26+
27+
All footage in the video are available in [Google Drive](https://drive.google.com/drive/folders/1VFnWwuu-YXDKG-N6vcjK_nL7YZMFapMU?usp=sharing) and [Baidu Pan](https://pan.baidu.com/s/1igMteDwN5rO1Sn7YIhBlvQ) (code: tb3w).
28+
29+
<br>
30+
31+
32+
## Demo
33+
* [Webcam Demo](https://peterl1n.github.io/RobustVideoMatting/#/demo): Run the model live in your browser. Visualize recurrent states.
34+
* [Colab Demo](https://colab.research.google.com/drive/10z-pNKRnVNsp0Lq9tH1J_XPZ7CBC_uHm?usp=sharing): Test our model on your own videos with free GPU.
35+
36+
<br>
37+
38+
## Download
39+
40+
We recommend MobileNetv3 models for most use cases. ResNet50 models are the larger variant with small performance improvements. Our model is available on various inference frameworks. See [inference documentation](documentation/inference.md) for more instructions.
41+
42+
<table>
43+
<thead>
44+
<tr>
45+
<td>Framework</td>
46+
<td>Download</td>
47+
<td>Notes</td>
48+
</tr>
49+
</thead>
50+
<tbody>
51+
<tr>
52+
<td>PyTorch</td>
53+
<td>
54+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3.pth">rvm_mobilenetv3.pth</a><br>
55+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50.pth">rvm_resnet50.pth</a>
56+
</td>
57+
<td>
58+
Official weights for PyTorch. <a href="documentation/inference.md#pytorch">Doc</a>
59+
</td>
60+
</tr>
61+
<tr>
62+
<td>TorchHub</td>
63+
<td>
64+
Nothing to Download.
65+
</td>
66+
<td>
67+
Easiest way to use our model in your PyTorch project. <a href="documentation/inference.md#torchhub">Doc</a>
68+
</td>
69+
</tr>
70+
<tr>
71+
<td>TorchScript</td>
72+
<td>
73+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp32.torchscript">rvm_mobilenetv3_fp32.torchscript</a><br>
74+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp16.torchscript">rvm_mobilenetv3_fp16.torchscript</a><br>
75+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_fp32.torchscript">rvm_resnet50_fp32.torchscript</a><br>
76+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_fp16.torchscript">rvm_resnet50_fp16.torchscript</a>
77+
</td>
78+
<td>
79+
If inference on mobile, consider export int8 quantized models yourself. <a href="documentation/inference.md#torchscript">Doc</a>
80+
</td>
81+
</tr>
82+
<tr>
83+
<td>ONNX</td>
84+
<td>
85+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp32.onnx">rvm_mobilenetv3_fp32.onnx</a><br>
86+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp16.onnx">rvm_mobilenetv3_fp16.onnx</a><br>
87+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_fp32.onnx">rvm_resnet50_fp32.onnx</a><br>
88+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_fp16.onnx">rvm_resnet50_fp16.onnx</a>
89+
</td>
90+
<td>
91+
Tested on ONNX Runtime with CPU and CUDA backends. Provided models use opset 12. <a href="documentation/inference.md#onnx">Doc</a>, <a href="https://github.com/PeterL1n/RobustVideoMatting/tree/onnx">Exporter</a>.
92+
</td>
93+
</tr>
94+
<tr>
95+
<td>TensorFlow</td>
96+
<td>
97+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_tf.zip">rvm_mobilenetv3_tf.zip</a><br>
98+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_tf.zip">rvm_resnet50_tf.zip</a>
99+
</td>
100+
<td>
101+
TensorFlow 2 SavedModel. <a href="documentation/inference.md#tensorflow">Doc</a>
102+
</td>
103+
</tr>
104+
<tr>
105+
<td>TensorFlow.js</td>
106+
<td>
107+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_tfjs_int8.zip">rvm_mobilenetv3_tfjs_int8.zip</a><br>
108+
</td>
109+
<td>
110+
Run the model on the web. <a href="https://peterl1n.github.io/RobustVideoMatting/#/demo">Demo</a>, <a href="https://github.com/PeterL1n/RobustVideoMatting/tree/tfjs">Starter Code</a>
111+
</td>
112+
</tr>
113+
<tr>
114+
<td>CoreML</td>
115+
<td>
116+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_1280x720_s0.375_fp16.mlmodel">rvm_mobilenetv3_1280x720_s0.375_fp16.mlmodel</a><br>
117+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_1280x720_s0.375_int8.mlmodel">rvm_mobilenetv3_1280x720_s0.375_int8.mlmodel</a><br>
118+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_1920x1080_s0.25_fp16.mlmodel">rvm_mobilenetv3_1920x1080_s0.25_fp16.mlmodel</a><br>
119+
<a href="https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_1920x1080_s0.25_int8.mlmodel">rvm_mobilenetv3_1920x1080_s0.25_int8.mlmodel</a><br>
120+
</td>
121+
<td>
122+
CoreML does not support dynamic resolution. Other resolutions can be exported yourself. Models require iOS 13+. <code>s</code> denotes <code>downsample_ratio</code>. <a href="documentation/inference.md#coreml">Doc</a>, <a href="https://github.com/PeterL1n/RobustVideoMatting/tree/coreml">Exporter</a>
123+
</td>
124+
</tr>
125+
</tbody>
126+
</table>
127+
128+
All models are available in [Google Drive](https://drive.google.com/drive/folders/1pBsG-SCTatv-95SnEuxmnvvlRx208VKj?usp=sharing) and [Baidu Pan](https://pan.baidu.com/s/1puPSxQqgBFOVpW4W7AolkA) (code: gym7).
129+
130+
<br>
131+
132+
## PyTorch Example
133+
134+
1. Install dependencies:
135+
```sh
136+
pip install -r requirements_inference.txt
137+
```
138+
139+
2. Load the model:
140+
141+
```python
142+
import torch
143+
from model import MattingNetwork
144+
145+
model = MattingNetwork('mobilenetv3').eval().cuda() # or "resnet50"
146+
model.load_state_dict(torch.load('rvm_mobilenetv3.pth'))
147+
```
148+
149+
3. To convert videos, we provide a simple conversion API:
150+
151+
```python
152+
from inference import convert_video
153+
154+
convert_video(
155+
model, # The model, can be on any device (cpu or cuda).
156+
input_source='input.mp4', # A video file or an image sequence directory.
157+
output_type='video', # Choose "video" or "png_sequence"
158+
output_composition='output.mp4', # File path if video; directory path if png sequence.
159+
output_video_mbps=4, # Output video mbps. Not needed for png sequence.
160+
downsample_ratio=None, # A hyperparameter to adjust or use None for auto.
161+
seq_chunk=12, # Process n frames at once for better parallelism.
162+
)
163+
```
164+
165+
4. Or write your own inference code:
166+
```python
167+
from torch.utils.data import DataLoader
168+
from torchvision.transforms import ToTensor
169+
from inference_utils import VideoReader, VideoWriter
170+
171+
reader = VideoReader('input.mp4', transform=ToTensor())
172+
writer = VideoWriter('output.mp4', frame_rate=30)
173+
174+
bgr = torch.tensor([.47, 1, .6]).view(3, 1, 1).cuda() # Green background.
175+
rec = [None] * 4 # Initial recurrent states.
176+
downsample_ratio = 0.25 # Adjust based on your video.
177+
178+
with torch.no_grad():
179+
for src in DataLoader(reader): # RGB tensor normalized to 0 ~ 1.
180+
fgr, pha, *rec = model(src.cuda(), *rec, downsample_ratio) # Cycle the recurrent states.
181+
com = fgr * pha + bgr * (1 - pha) # Composite to green background.
182+
writer.write(com) # Write frame.
183+
```
184+
185+
5. The models and converter API are also available through TorchHub.
186+
187+
```python
188+
# Load the model.
189+
model = torch.hub.load("PeterL1n/RobustVideoMatting", "mobilenetv3") # or "resnet50"
190+
191+
# Converter API.
192+
convert_video = torch.hub.load("PeterL1n/RobustVideoMatting", "converter")
193+
```
194+
195+
Please see [inference documentation](documentation/inference.md) for details on `downsample_ratio` hyperparameter, more converter arguments, and more advanced usage.
196+
197+
<br>
198+
199+
## Training and Evaluation
200+
201+
Please refer to the [training documentation](documentation/training.md) to train and evaluate your own model.
202+
203+
<br>
204+
205+
## Speed
206+
207+
Speed is measured with `inference_speed_test.py` for reference.
208+
209+
| GPU | dType | HD (1920x1080) | 4K (3840x2160) |
210+
| -------------- | ----- | -------------- |----------------|
211+
| RTX 3090 | FP16 | 172 FPS | 154 FPS |
212+
| RTX 2060 Super | FP16 | 134 FPS | 108 FPS |
213+
| GTX 1080 Ti | FP32 | 104 FPS | 74 FPS |
214+
215+
* Note 1: HD uses `downsample_ratio=0.25`, 4K uses `downsample_ratio=0.125`. All tests use batch size 1 and frame chunk 1.
216+
* Note 2: GPUs before Turing architecture does not support FP16 inference, so GTX 1080 Ti uses FP32.
217+
* Note 3: We only measure tensor throughput. The provided video conversion script in this repo is expected to be much slower, because it does not utilize hardware video encoding/decoding and does not have the tensor transfer done on parallel threads. If you are interested in implementing hardware video encoding/decoding in Python, please refer to [PyNvCodec](https://github.com/NVIDIA/VideoProcessingFramework).
218+
219+
<br>
220+
221+
## Project Members
222+
* [Shanchuan Lin](https://www.linkedin.com/in/shanchuanlin/)
223+
* [Linjie Yang](https://sites.google.com/site/linjieyang89/)
224+
* [Imran Saleemi](https://www.linkedin.com/in/imran-saleemi/)
225+
* [Soumyadip Sengupta](https://homes.cs.washington.edu/~soumya91/)

0 commit comments

Comments
 (0)