|
4 | 4 |
|
5 | 5 | ### 支持模型
|
6 | 6 |
|
7 |
| -已支持:TensorFlow、TFLite、PyTorch ( ONNX )、MXNet、Caffe、Darknet |
| 7 | +已支持:TensorFlow、TFLite、PyTorch ( ONNX )、MXNet、Caffe、Darknet、OneFlow、PaddlePaddle |
8 | 8 |
|
9 | 9 | 计划中:TensorFlow2 ( MLIR )、PyTorch ( TorchScript )
|
10 | 10 |
|
|
14 | 14 |
|
15 | 15 | ## 如何获取模型中间结果?
|
16 | 16 |
|
17 |
| -创建 cmake 工程时添加 `-DTENGINE_DEBUG_DATA=ON` 变量,将在每一层的 input、output tensor 数据保存到 `./output` 中,以 mobilenet_v1.tmfile 为例: |
| 17 | +### 使用方法 |
| 18 | + |
| 19 | +- 程序执行前,添加环境变量 `export TG_DEBUG_DATA=1`,启用精度 Profiler 功能; |
| 20 | +- 删除环境变量 `unset TG_DEBUG_DATA`, 关闭精度 Profiler 功能。 |
| 21 | + |
| 22 | +### logo 信息 |
| 23 | + |
| 24 | +数据导出后,在程序执行的当前路径下生成 `./output` 文件夹。 |
18 | 25 |
|
19 | 26 | ```bash
|
20 | 27 | $ ./tm_classification -m models/squeezenet.tmfile -i images/cat.jpg
|
@@ -93,53 +100,51 @@ pool6_out_blob_data.txt
|
93 | 100 |
|
94 | 101 | ## 如何获取模型各个layer耗时?
|
95 | 102 |
|
96 |
| -创建 cmake 工程时添加 `-DTENGINE_DEBUG_TIME=ON` 变量,将在网络模型运行过程中打印每一层的耗时情况,以 mobilenet_v1.tmfile 为例: |
| 103 | +### 使用方法 |
97 | 104 |
|
98 |
| -```bash |
99 |
| -$ ./tm_classification -m models/squeezenet.tmfile -i images/cat.jpg |
100 |
| -Convolution 13.29 ms conv1-conv1/bn-conv1/scale-relu1 |
101 |
| -Convolution 16.78 ms conv2_1/dw-conv2_1/dw/bn-conv2_1/dw/scale-relu2_1/dw |
102 |
| -Convolution 32.15 ms conv2_1/sep-conv2_1/sep/bn-conv2_1/sep/scale-relu2_1/sep |
103 |
| -Convolution 7.85 ms conv2_2/dw-conv2_2/dw/bn-conv2_2/dw/scale-relu2_2/dw |
104 |
| -Convolution 26.75 ms conv2_2/sep-conv2_2/sep/bn-conv2_2/sep/scale-relu2_2/sep |
105 |
| -Convolution 15.02 ms conv3_1/dw-conv3_1/dw/bn-conv3_1/dw/scale-relu3_1/dw |
106 |
| -Convolution 59.70 ms conv3_1/sep-conv3_1/sep/bn-conv3_1/sep/scale-relu3_1/sep |
107 |
| -Convolution 3.48 ms conv3_2/dw-conv3_2/dw/bn-conv3_2/dw/scale-relu3_2/dw |
108 |
| -Convolution 28.39 ms conv3_2/sep-conv3_2/sep/bn-conv3_2/sep/scale-relu3_2/sep |
109 |
| -Convolution 8.57 ms conv4_1/dw-conv4_1/dw/bn-conv4_1/dw/scale-relu4_1/dw |
110 |
| -Convolution 61.16 ms conv4_1/sep-conv4_1/sep/bn-conv4_1/sep/scale-relu4_1/sep |
111 |
| -Convolution 2.21 ms conv4_2/dw-conv4_2/dw/bn-conv4_2/dw/scale-relu4_2/dw |
112 |
| -Convolution 31.55 ms conv4_2/sep-conv4_2/sep/bn-conv4_2/sep/scale-relu4_2/sep |
113 |
| -Convolution 4.19 ms conv5_1/dw-conv5_1/dw/bn-conv5_1/dw/scale-relu5_1/dw |
114 |
| -Convolution 63.83 ms conv5_1/sep-conv5_1/sep/bn-conv5_1/sep/scale-relu5_1/sep |
115 |
| -Convolution 3.96 ms conv5_2/dw-conv5_2/dw/bn-conv5_2/dw/scale-relu5_2/dw |
116 |
| -Convolution 65.00 ms conv5_2/sep-conv5_2/sep/bn-conv5_2/sep/scale-relu5_2/sep |
117 |
| -Convolution 4.95 ms conv5_3/dw-conv5_3/dw/bn-conv5_3/dw/scale-relu5_3/dw |
118 |
| -Convolution 65.26 ms conv5_3/sep-conv5_3/sep/bn-conv5_3/sep/scale-relu5_3/sep |
119 |
| -Convolution 4.02 ms conv5_4/dw-conv5_4/dw/bn-conv5_4/dw/scale-relu5_4/dw |
120 |
| -Convolution 64.07 ms conv5_4/sep-conv5_4/sep/bn-conv5_4/sep/scale-relu5_4/sep |
121 |
| -Convolution 4.21 ms conv5_5/dw-conv5_5/dw/bn-conv5_5/dw/scale-relu5_5/dw |
122 |
| -Convolution 69.94 ms conv5_5/sep-conv5_5/sep/bn-conv5_5/sep/scale-relu5_5/sep |
123 |
| -Convolution 1.89 ms conv5_6/dw-conv5_6/dw/bn-conv5_6/dw/scale-relu5_6/dw |
124 |
| -Convolution 31.51 ms conv5_6/sep-conv5_6/sep/bn-conv5_6/sep/scale-relu5_6/sep |
125 |
| -Convolution 3.37 ms conv6/dw-conv6/dw/bn-conv6/dw/scale-relu6/dw |
126 |
| -Convolution 63.23 ms conv6/sep-conv6/sep/bn-conv6/sep/scale-relu6/sep |
127 |
| -Pooling 0.23 ms pool6 |
128 |
| -Convolution 1.53 ms fc7 |
| 105 | +- 程序执行前,添加环境变量 `export TG_DEBUG_TIME=1`,启用性能 Profiler 功能; |
| 106 | +- 删除环境变量 `unset TG_DEBUG_TIME`, 关闭性能 Profiler 功能。 |
| 107 | + |
| 108 | +### logo 信息 |
129 | 109 |
|
130 |
| -model file : models/squeezenet.tmfile |
131 |
| -image file : images/cat.jpg |
132 |
| -label_file : (null) |
133 |
| -img_h, img_w, scale[3], mean[3] : 227 227 , 1.000 1.000 1.000, 104.0 116.7 122.7 |
134 |
| -Repeat 1 times, thread 1, avg time 480.62 ms, max_time 480.62 ms, min_time 480.62 ms |
135 |
| --------------------------------------- |
136 |
| -0.273199, 281 |
137 |
| -0.267552, 282 |
138 |
| -0.181004, 278 |
139 |
| -0.081799, 285 |
140 |
| -0.072407, 151 |
141 |
| --------------------------------------- |
142 | 110 | ```
|
| 111 | + 0 [ 7.48% : 0.7 ms] Convolution idx: 5 shape: {1 3 100 100} -> {1 8 50 50} int8 K: 3x3 | S: 2x2 | P: 0 1 0 1 MFLOPS: 1.08 Rate:1519 |
| 112 | + 1 [ 6.66% : 0.6 ms] Convolution idx: 8 shape: {1 8 50 50} -> {1 8 50 50} int8 K: 3x3 | S: 1x1 | P: 1 1 1 1 DW( 8) MFLOPS: 0.36 Rate:569 |
| 113 | + 2 [ 9.54% : 0.9 ms] Convolution idx: 11 shape: {1 8 50 50} -> {1 16 50 50} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 0.64 Rate:706 |
| 114 | + 3 [ 3.99% : 0.4 ms] Convolution idx: 14 shape: {1 16 50 50} -> {1 16 25 25} int8 K: 3x3 | S: 2x2 | P: 0 1 0 1 DW( 16) MFLOPS: 0.18 Rate:475 |
| 115 | + 4 [ 6.77% : 0.6 ms] Convolution idx: 17 shape: {1 16 25 25} -> {1 32 25 25} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 0.64 Rate:995 |
| 116 | + 5 [ 6.90% : 0.7 ms] Convolution idx: 20 shape: {1 32 25 25} -> {1 32 25 25} int8 K: 3x3 | S: 1x1 | P: 1 1 1 1 DW( 32) MFLOPS: 0.36 Rate:549 |
| 117 | + 6 [ 4.20% : 0.4 ms] Convolution idx: 23 shape: {1 32 25 25} -> {1 32 25 25} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 1.28 Rate:3207 |
| 118 | + 7 [ 1.42% : 0.1 ms] Convolution idx: 26 shape: {1 32 25 25} -> {1 32 13 13} int8 K: 3x3 | S: 2x2 | P: 1 1 1 1 DW( 32) MFLOPS: 0.10 Rate:721 |
| 119 | + 8 [ 2.36% : 0.2 ms] Convolution idx: 29 shape: {1 32 13 13} -> {1 64 13 13} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 0.69 Rate:3092 |
| 120 | + 9 [ 3.43% : 0.3 ms] Convolution idx: 32 shape: {1 64 13 13} -> {1 64 13 13} int8 K: 3x3 | S: 1x1 | P: 1 1 1 1 DW( 64) MFLOPS: 0.19 Rate:597 |
| 121 | +10 [ 3.98% : 0.4 ms] Convolution idx: 35 shape: {1 64 13 13} -> {1 64 13 13} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 1.38 Rate:3663 |
| 122 | +11 [ 0.80% : 0.1 ms] Convolution idx: 38 shape: {1 64 13 13} -> {1 64 7 7} int8 K: 3x3 | S: 2x2 | P: 1 1 1 1 DW( 64) MFLOPS: 0.06 Rate:741 |
| 123 | +12 [ 2.24% : 0.2 ms] Convolution idx: 41 shape: {1 64 7 7} -> {1 128 7 7} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 0.80 Rate:3771 |
| 124 | +13 [ 1.59% : 0.2 ms] Convolution idx: 44 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 3x3 | S: 1x1 | P: 1 1 1 1 DW(128) MFLOPS: 0.11 Rate:747 |
| 125 | +14 [ 4.21% : 0.4 ms] Convolution idx: 47 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 1.61 Rate:4015 |
| 126 | +15 [ 1.53% : 0.1 ms] Convolution idx: 50 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 3x3 | S: 1x1 | P: 1 1 1 1 DW(128) MFLOPS: 0.11 Rate:778 |
| 127 | +16 [ 4.41% : 0.4 ms] Convolution idx: 53 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 1.61 Rate:3833 |
| 128 | +17 [ 1.66% : 0.2 ms] Convolution idx: 56 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 3x3 | S: 1x1 | P: 1 1 1 1 DW(128) MFLOPS: 0.11 Rate:715 |
| 129 | +18 [ 4.16% : 0.4 ms] Convolution idx: 59 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 1.61 Rate:4065 |
| 130 | +19 [ 1.52% : 0.1 ms] Convolution idx: 62 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 3x3 | S: 1x1 | P: 1 1 1 1 DW(128) MFLOPS: 0.11 Rate:784 |
| 131 | +20 [ 4.46% : 0.4 ms] Convolution idx: 65 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 1.61 Rate:3786 |
| 132 | +21 [ 1.59% : 0.2 ms] Convolution idx: 68 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 3x3 | S: 1x1 | P: 1 1 1 1 DW(128) MFLOPS: 0.11 Rate:748 |
| 133 | +22 [ 4.37% : 0.4 ms] Convolution idx: 71 shape: {1 128 7 7} -> {1 128 7 7} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 1.61 Rate:3869 |
| 134 | +23 [ 0.54% : 0.1 ms] Convolution idx: 74 shape: {1 128 7 7} -> {1 128 4 4} int8 K: 3x3 | S: 2x2 | P: 1 1 1 1 DW(128) MFLOPS: 0.04 Rate:722 |
| 135 | +24 [ 2.88% : 0.3 ms] Convolution idx: 77 shape: {1 128 4 4} -> {1 256 4 4} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 1.05 Rate:3825 |
| 136 | +25 [ 1.02% : 0.1 ms] Convolution idx: 80 shape: {1 256 4 4} -> {1 256 4 4} int8 K: 3x3 | S: 1x1 | P: 1 1 1 1 DW(256) MFLOPS: 0.07 Rate:761 |
| 137 | +26 [ 5.54% : 0.5 ms] Convolution idx: 81 shape: {1 256 4 4} -> {1 256 4 4} int8 K: 1x1 | S: 1x1 | P: 0 0 0 0 MFLOPS: 2.10 Rate:3986 |
| 138 | +27 [ 0.11% : 0.0 ms] Pooling idx: 84 shape: {1 256 4 4} -> {1 256 1 1} int8 K: 4x4 | S: 1x1 | P: 0 0 0 0 Avg |
| 139 | +28 [ 0.27% : 0.0 ms] FullyConnected idx: 85 shape: {1 256 1 1} -> {1 131 1 1} int8 |
| 140 | +29 [ 0.40% : 0.0 ms] Softmax idx: 86 shape: {1 131 1 1} -> {1 131 1 1} int8 |
| 141 | +``` |
| 142 | + |
| 143 | +## 如何屏蔽性能算子? |
143 | 144 |
|
144 |
| -## Todo...... |
| 145 | +Naive Profiler,用于关闭 CPU 性能算子,后端计算只使用 Naive C 实现的 reference op,用于对比分析性能算子的计算结果。 |
| 146 | + |
| 147 | +### 使用方法 |
145 | 148 |
|
| 149 | +- 程序执行前,添加环境变量 `export TG_DEBUG_REF=1`,启用 Naive Profiler 功能; |
| 150 | +- 删除环境变量 `unset TG_DEBUG_REF`, 关闭精度 Naive Profiler 功能。 |
0 commit comments