Skip to content

Commit ace6833

Browse files
committed
Miscellaneous fixes for MobileNet
1 parent 099c1a5 commit ace6833

File tree

7 files changed

+50
-42
lines changed

7 files changed

+50
-42
lines changed

src/convnets/convmixer.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Creates a ConvMixer model.
99
1010
- `planes`: number of planes in the output of each block
1111
- `depth`: number of layers
12-
- `inchannels`: number of channels in the input
12+
- `inchannels`: The number of channels in the input. The default value is 3.
1313
- `kernel_size`: kernel size of the convolutional layers
1414
- `patch_size`: size of the patches
1515
- `activation`: activation function used after the convolutional layers
@@ -45,7 +45,7 @@ Creates a ConvMixer model.
4545
# Arguments
4646
4747
- `mode`: the mode of the model, either `:base`, `:small` or `:large`
48-
- `inchannels`: number of channels in the input
48+
- `inchannels`: The number of channels in the input. The default value is 3.
4949
- `activation`: activation function used after the convolutional layers
5050
- `nclasses`: number of classes in the output
5151
"""

src/convnets/convnext.jl

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ Creates the layers for a ConvNeXt model.
3333
- `depths`: list with configuration for depth of each block
3434
- `planes`: list with configuration for number of output channels in each block
3535
- `drop_path_rate`: Stochastic depth rate.
36-
- `λ`: Initial value for [`LayerScale`](#)
37-
([reference](https://arxiv.org/abs/2103.17239))
36+
- `λ`: Initial value for [`LayerScale`](#)
37+
([reference](https://arxiv.org/abs/2103.17239))
3838
- `nclasses`: number of output classes
3939
"""
4040
function convnext(depths, planes; inchannels = 3, drop_path_rate = 0.0, λ = 1.0f-6,
@@ -92,7 +92,7 @@ Creates a ConvNeXt model.
9292
9393
# Arguments:
9494
95-
- `inchannels`: number of input channels.
95+
- `inchannels`: The number of channels in the input. The default value is 3.
9696
- `drop_path_rate`: Stochastic depth rate.
9797
- `λ`: Init value for [LayerScale](https://arxiv.org/abs/2103.17239)
9898
- `nclasses`: number of output classes

src/convnets/inception.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,7 @@ Creates an Inceptionv4 model.
326326
# Arguments
327327
328328
- `pretrain`: set to `true` to load the pre-trained weights for ImageNet
329-
- `inchannels`: number of input channels.
329+
- `inchannels`: The number of channels in the input. The default value is 3.
330330
- `dropout`: rate of dropout in classifier head.
331331
- `nclasses`: the number of output classes.
332332
@@ -426,7 +426,7 @@ Creates an InceptionResNetv2 model.
426426
427427
# Arguments
428428
429-
- `inchannels`: number of input channels.
429+
- `inchannels`: The number of channels in the input. The default value is 3.
430430
- `dropout`: rate of dropout in classifier head.
431431
- `nclasses`: the number of output classes.
432432
"""
@@ -459,12 +459,12 @@ Creates an InceptionResNetv2 model.
459459
# Arguments
460460
461461
- `pretrain`: set to `true` to load the pre-trained weights for ImageNet
462-
- `inchannels`: number of input channels.
462+
- `inchannels`: The number of channels in the input. The default value is 3.
463463
- `dropout`: rate of dropout in classifier head.
464464
- `nclasses`: the number of output classes.
465465
466466
!!! warning
467-
467+
468468
`InceptionResNetv2` does not currently support pretrained weights.
469469
"""
470470
struct InceptionResNetv2
@@ -496,7 +496,7 @@ Create an Xception block.
496496
497497
# Arguments
498498
499-
- `inchannels`: number of input channels.
499+
- `inchannels`: The number of channels in the input. The default value is 3.
500500
- `outchannels`: number of output channels.
501501
- `nrepeats`: number of repeats of depthwise separable convolution layers.
502502
- `stride`: stride by which to downsample the input.
@@ -540,7 +540,7 @@ Creates an Xception model.
540540
541541
# Arguments
542542
543-
- `inchannels`: number of input channels.
543+
- `inchannels`: The number of channels in the input. The default value is 3.
544544
- `dropout`: rate of dropout in classifier head.
545545
- `nclasses`: the number of output classes.
546546
"""
@@ -571,7 +571,7 @@ Creates an Xception model.
571571
# Arguments
572572
573573
- `pretrain`: set to `true` to load the pre-trained weights for ImageNet.
574-
- `inchannels`: number of input channels.
574+
- `inchannels`: The number of channels in the input. The default value is 3.
575575
- `dropout`: rate of dropout in classifier head.
576576
- `nclasses`: the number of output classes.
577577

src/convnets/mobilenet.jl

Lines changed: 35 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,7 @@
44
mobilenetv1(width_mult, config;
55
activation = relu,
66
inchannels = 3,
7-
nclasses = 1000,
8-
fcsize = 1024)
7+
nclasses = 1000)
98
109
Create a MobileNetv1 model ([reference](https://arxiv.org/abs/1704.04861v1)).
1110
@@ -21,23 +20,24 @@ Create a MobileNetv1 model ([reference](https://arxiv.org/abs/1704.04861v1)).
2120
+ `s`: The stride of the convolutional kernel
2221
+ `r`: The number of time this configuration block is repeated
2322
- `activate`: The activation function to use throughout the network
24-
- `inchannels`: The number of input feature maps``
23+
- `inchannels`: The number of input channels. The default value is 3.
2524
- `fcsize`: The intermediate fully-connected size between the convolution and final layers
2625
- `nclasses`: The number of output classes
2726
"""
2827
function mobilenetv1(width_mult, config;
2928
activation = relu,
3029
inchannels = 3,
31-
nclasses = 1000,
32-
fcsize = 1024)
30+
fcsize = 1024,
31+
nclasses = 1000)
3332
layers = []
3433
for (dw, outch, stride, nrepeats) in config
3534
outch = Int(outch * width_mult)
3635
for _ in 1:nrepeats
3736
layer = dw ?
3837
depthwise_sep_conv_bn((3, 3), inchannels, outch, activation;
3938
stride = stride, pad = 1, bias = false) :
40-
conv_bn((3, 3), inchannels, outch, activation; stride = stride, pad = 1)
39+
conv_bn((3, 3), inchannels, outch, activation; stride = stride, pad = 1,
40+
bias = false)
4141
append!(layers, layer)
4242
inchannels = outch
4343
end
@@ -51,7 +51,7 @@ function mobilenetv1(width_mult, config;
5151
end
5252

5353
const mobilenetv1_configs = [
54-
# dw, c, s, r
54+
# dw, c, s, r
5555
(false, 32, 2, 1),
5656
(true, 64, 1, 1),
5757
(true, 128, 2, 1),
@@ -65,7 +65,7 @@ const mobilenetv1_configs = [
6565
]
6666

6767
"""
68-
MobileNetv1(width_mult = 1; pretrain = false, nclasses = 1000)
68+
MobileNetv1(width_mult = 1; inchannels = 3, pretrain = false, nclasses = 1000)
6969
7070
Create a MobileNetv1 model with the baseline configuration
7171
([reference](https://arxiv.org/abs/1704.04861v1)).
@@ -76,6 +76,7 @@ Set `pretrain` to `true` to load the pretrained weights for ImageNet.
7676
- `width_mult`: Controls the number of output feature maps in each block
7777
(with 1.0 being the default in the paper;
7878
this is usually a value between 0.1 and 1.4)
79+
- `inchannels`: The number of input channels. The default value is 3.
7980
- `pretrain`: Whether to load the pre-trained weights for ImageNet
8081
- `nclasses`: The number of output classes
8182
@@ -85,10 +86,10 @@ struct MobileNetv1
8586
layers::Any
8687
end
8788

88-
function MobileNetv1(width_mult::Number = 1; pretrain = false, nclasses = 1000)
89-
layers = mobilenetv1(width_mult, mobilenetv1_configs; nclasses = nclasses)
89+
function MobileNetv1(width_mult::Number = 1; inchannels = 3, pretrain = false,
90+
nclasses = 1000)
91+
layers = mobilenetv1(width_mult, mobilenetv1_configs; inchannels, nclasses)
9092
pretrain && loadpretrain!(layers, string("MobileNetv1"))
91-
9293
return MobileNetv1(layers)
9394
end
9495

@@ -102,7 +103,7 @@ classifier(m::MobileNetv1) = m.layers[2]
102103
# MobileNetv2
103104

104105
"""
105-
mobilenetv2(width_mult, configs; max_width = 1280, nclasses = 1000)
106+
mobilenetv2(width_mult, configs; inchannels = 3, max_width = 1280, nclasses = 1000)
106107
107108
Create a MobileNetv2 model.
108109
([reference](https://arxiv.org/abs/1801.04381)).
@@ -119,14 +120,15 @@ Create a MobileNetv2 model.
119120
+ `n`: The number of times a block is repeated
120121
+ `s`: The stride of the convolutional kernel
121122
+ `a`: The activation function used in the bottleneck layer
123+
- `inchannels`: The number of input channels. The default value is 3.
122124
- `max_width`: The maximum number of feature maps in any layer of the network
123125
- `nclasses`: The number of output classes
124126
"""
125-
function mobilenetv2(width_mult, configs; max_width = 1280, nclasses = 1000)
127+
function mobilenetv2(width_mult, configs; inchannels = 3, max_width = 1280, nclasses = 1000)
126128
# building first layer
127129
inplanes = _round_channels(32 * width_mult, width_mult == 0.1 ? 4 : 8)
128130
layers = []
129-
append!(layers, conv_bn((3, 3), 3, inplanes; stride = 2))
131+
append!(layers, conv_bn((3, 3), inchannels, inplanes; pad = 1, stride = 2))
130132
# building inverted residual blocks
131133
for (t, c, n, s, a) in configs
132134
outplanes = _round_channels(c * width_mult, width_mult == 0.1 ? 4 : 8)
@@ -165,7 +167,7 @@ struct MobileNetv2
165167
end
166168

167169
"""
168-
MobileNetv2(width_mult = 1.0; pretrain = false, nclasses = 1000)
170+
MobileNetv2(width_mult = 1.0; inchannels = 3, pretrain = false, nclasses = 1000)
169171
170172
Create a MobileNetv2 model with the specified configuration.
171173
([reference](https://arxiv.org/abs/1801.04381)).
@@ -176,13 +178,15 @@ Set `pretrain` to `true` to load the pretrained weights for ImageNet.
176178
- `width_mult`: Controls the number of output feature maps in each block
177179
(with 1.0 being the default in the paper;
178180
this is usually a value between 0.1 and 1.4)
181+
- `inchannels`: The number of input channels. The default value is 3.
179182
- `pretrain`: Whether to load the pre-trained weights for ImageNet
180183
- `nclasses`: The number of output classes
181184
182185
See also [`Metalhead.mobilenetv2`](#).
183186
"""
184-
function MobileNetv2(width_mult::Number = 1; pretrain = false, nclasses = 1000)
185-
layers = mobilenetv2(width_mult, mobilenetv2_configs; nclasses = nclasses)
187+
function MobileNetv2(width_mult::Number = 1; inchannels = 3, pretrain = false,
188+
nclasses = 1000)
189+
layers = mobilenetv2(width_mult, mobilenetv2_configs; inchannels, nclasses)
186190
pretrain && loadpretrain!(layers, string("MobileNetv2"))
187191
return MobileNetv2(layers)
188192
end
@@ -197,7 +201,7 @@ classifier(m::MobileNetv2) = m.layers[2]
197201
# MobileNetv3
198202

199203
"""
200-
mobilenetv3(width_mult, configs; max_width = 1024, nclasses = 1000)
204+
mobilenetv3(width_mult, configs; inchannels = 3, max_width = 1024, nclasses = 1000)
201205
202206
Create a MobileNetv3 model.
203207
([reference](https://arxiv.org/abs/1905.02244)).
@@ -216,14 +220,17 @@ Create a MobileNetv3 model.
216220
+ `r::Integer` - The reduction factor (`>= 1` or `nothing` to skip) for squeeze and excite layers
217221
+ `s::Integer` - The stride of the convolutional kernel
218222
+ `a` - The activation function used in the bottleneck (typically `hardswish` or `relu`)
223+
- `inchannels`: The number of input channels. The default value is 3.
219224
- `max_width`: The maximum number of feature maps in any layer of the network
220225
- `nclasses`: the number of output classes
221226
"""
222-
function mobilenetv3(width_mult, configs; max_width = 1024, nclasses = 1000)
227+
function mobilenetv3(width_mult, configs; inchannels = 3, max_width = 1024, nclasses = 1000)
223228
# building first layer
224229
inplanes = _round_channels(16 * width_mult, 8)
225230
layers = []
226-
append!(layers, conv_bn((3, 3), 3, inplanes, hardswish; stride = 2))
231+
append!(layers,
232+
conv_bn((3, 3), inchannels, inplanes, hardswish; pad = 1, stride = 2,
233+
bias = false))
227234
explanes = 0
228235
# building inverted residual blocks
229236
for (k, t, c, r, a, s) in configs
@@ -249,7 +256,7 @@ end
249256

250257
# Configurations for small and large mode for MobileNetv3
251258
mobilenetv3_configs = Dict(:small => [
252-
# k, t, c, SE, a, s
259+
# k, t, c, SE, a, s
253260
(3, 1, 16, 4, relu, 2),
254261
(3, 4.5, 24, nothing, relu, 2),
255262
(3, 3.67, 24, nothing, relu, 1),
@@ -263,7 +270,7 @@ mobilenetv3_configs = Dict(:small => [
263270
(5, 6, 96, 4, hardswish, 1),
264271
],
265272
:large => [
266-
# k, t, c, SE, a, s
273+
# k, t, c, SE, a, s
267274
(3, 1, 16, nothing, relu, 1),
268275
(3, 4, 24, nothing, relu, 2),
269276
(3, 3, 24, nothing, relu, 1),
@@ -287,7 +294,7 @@ struct MobileNetv3
287294
end
288295

289296
"""
290-
MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; pretrain = false, nclasses = 1000)
297+
MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; inchannels = 3, pretrain = false, nclasses = 1000)
291298
292299
Create a MobileNetv3 model with the specified configuration.
293300
([reference](https://arxiv.org/abs/1905.02244)).
@@ -299,17 +306,18 @@ Set `pretrain = true` to load the model with pre-trained weights for ImageNet.
299306
- `width_mult`: Controls the number of output feature maps in each block
300307
(with 1.0 being the default in the paper;
301308
this is usually a value between 0.1 and 1.4)
309+
- `inchannels`: The number of channels in the input. The default value is 3.
302310
- `pretrain`: whether to load the pre-trained weights for ImageNet
303311
- `nclasses`: the number of output classes
304312
305313
See also [`Metalhead.mobilenetv3`](#).
306314
"""
307-
function MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; pretrain = false,
308-
nclasses = 1000)
315+
function MobileNetv3(mode::Symbol = :small, width_mult::Number = 1; inchannels = 3,
316+
pretrain = false, nclasses = 1000)
309317
@assert mode in [:large, :small] "`mode` has to be either :large or :small"
310318
max_width = (mode == :large) ? 1280 : 1024
311-
layers = mobilenetv3(width_mult, mobilenetv3_configs[mode]; max_width = max_width,
312-
nclasses = nclasses)
319+
layers = mobilenetv3(width_mult, mobilenetv3_configs[mode]; inchannels, max_width,
320+
nclasses)
313321
pretrain && loadpretrain!(layers, string("MobileNetv3", mode))
314322
return MobileNetv3(layers)
315323
end

src/convnets/resnext.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ Create a ResNeXt model with specified configuration. Currently supported values
112112
Set `pretrain = true` to load the model with pre-trained weights for ImageNet.
113113
114114
!!! warning
115-
115+
116116
`ResNeXt` does not currently support pretrained weights.
117117
118118
See also [`Metalhead.resnext`](#).

src/layers/embeddings.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ patches.
1111
# Arguments:
1212
1313
- `imsize`: the size of the input image
14-
- `inchannels`: the number of channels in the input image
14+
- `inchannels`: the number of channels in the input. The default value is 3.
1515
- `patch_size`: the size of the patches
1616
- `embedplanes`: the number of channels in the embedding
1717
- `norm_layer`: the normalization layer - by default the identity function but otherwise takes a

src/vit-based/vit.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ Creates a Vision Transformer (ViT) model.
8080
# Arguments
8181
8282
- `mode`: the model configuration, one of
83-
`[:tiny, :small, :base, :large, :huge, :giant, :gigantic]`
83+
`[:tiny, :small, :base, :large, :huge, :giant, :gigantic]`
8484
- `imsize`: image size
8585
- `inchannels`: number of input channels
8686
- `patch_size`: size of the patches

0 commit comments

Comments
 (0)