Skip to content

Commit 7ba93ae

Browse files
authored
Merge branch 'huggingface:main' into cvt
2 parents e69b906 + 84cb225 commit 7ba93ae

File tree

10 files changed

+246
-73
lines changed

10 files changed

+246
-73
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,12 @@
2626
* The Hugging Face Hub (https://huggingface.co/timm) is now the primary source for `timm` weights. Model cards include link to papers, original source, license.
2727
* Previous 0.6.x can be cloned from [0.6.x](https://github.com/rwightman/pytorch-image-models/tree/0.6.x) branch or installed via pip with version.
2828

29+
### May 14, 2024
30+
* Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
31+
* Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
32+
* Add `normalize=` flag for transorms, return non-normalized torch.Tensor with original dytpe (for `chug`)
33+
* Version 1.0.3 release
34+
2935
### May 11, 2024
3036
* `Searching for Better ViT Baselines (For the GPU Poor)` weights and vit variants released. Exploring model shapes between Tiny and Base.
3137

@@ -42,6 +48,7 @@
4248
| [vit_medium_patch16_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_medium_patch16_reg4_gap_256.sbb_in1k) | 83.47 | 96.622 | 38.88 | 256 |
4349
| [vit_medium_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_medium_patch16_reg1_gap_256.sbb_in1k) | 83.462 | 96.548 | 38.88 | 256 |
4450
| [vit_little_patch16_reg4_gap_256.sbb_in1k](https://huggingface.co/timm/vit_little_patch16_reg4_gap_256.sbb_in1k) | 82.514 | 96.262 | 22.52 | 256 |
51+
| [vit_wee_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_wee_patch16_reg1_gap_256.sbb_in1k) | 80.256 | 95.360 | 13.42 | 256 |
4552
| [vit_pwee_patch16_reg1_gap_256.sbb_in1k](https://huggingface.co/timm/vit_pwee_patch16_reg1_gap_256.sbb_in1k) | 80.072 | 95.136 | 15.25 | 256 |
4653
| [vit_mediumd_patch16_reg4_gap_256.sbb_in12k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_256.sbb_in12k) | N/A | N/A | 64.11 | 256 |
4754
| [vit_betwixt_patch16_reg4_gap_256.sbb_in12k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb_in12k) | N/A | N/A | 60.4 | 256 |

hfdocs/source/feature_extraction.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -192,9 +192,9 @@ There are two additional creation arguments impacting the output features.
192192

193193
#### Output index selection
194194

195-
The `out_indices` argument is supported by all models, but not all models have the same index to feature stride mapping. Look at the code or check feature_info to compare. The out indices generally correspond to the `C(i+1)th` feature level (a `2^(i+1)` reduction). For most convnet models, index 0 is the stride 2 features, and index 4 is stride 32. For many ViT or ViT-Conv hybrids there may be many to all features maps of the same shape, or a combination of hierarchical and non-hieararchical feature maps. It is best to look at the `feature_info` attribute to see the number of features, their corresponding channel count and reduction level.
195+
The `out_indices` argument is supported by all models, but not all models have the same index to feature stride mapping. Look at the code or check feature_info to compare. The out indices generally correspond to the `C(i+1)th` feature level (a `2^(i+1)` reduction). For most convnet models, index 0 is the stride 2 features, and index 4 is stride 32. For many ViT or ViT-Conv hybrids there may be many to all features maps of the same shape, or a combination of hierarchical and non-hierarchical feature maps. It is best to look at the `feature_info` attribute to see the number of features, their corresponding channel count and reduction level.
196196

197-
`out_indices` supports negative indexing, this makes it easy to get the last, penunltimate, etc feature map. `out_indices=(-2,)` would return the penultimate feature map for any model.
197+
`out_indices` supports negative indexing, this makes it easy to get the last, penultimate, etc feature map. `out_indices=(-2,)` would return the penultimate feature map for any model.
198198

199199
#### Output stride (feature map dilation)
200200

@@ -228,7 +228,7 @@ Accompanying the `forward_intermediates` function is a `prune_intermediate_layer
228228

229229
An `indices` argument is used for both `forward_intermediates()` and `prune_intermediate_layers()` to select the features to return or layers to remove. As with the `out_indices` for `features_only` API, `indices` is model specific and selects which intermediates are returned.
230230

231-
In non-hierarchical block based models such as ViT the indices correspond to the blocks, in models with hierarchical stages they usually correspond to the output of the stem + each hierarhical stage. Both positive (from the start), and negative (relative to the end) indexing works, and `None` is used to return all intermediates.
231+
In non-hierarchical block based models such as ViT the indices correspond to the blocks, in models with hierarchical stages they usually correspond to the output of the stem + each hierarchical stage. Both positive (from the start), and negative (relative to the end) indexing works, and `None` is used to return all intermediates.
232232

233233
The `prune_intermediate_layers()` call returns an indices variable, as negative indices must be converted to absolute (positive) indices when the model is trimmed.
234234

hfdocs/source/installation.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ You should install `timm` in a [virtual environment](https://docs.python.org/3/l
2828
# Deactivate the virtual environment
2929
source .env/bin/deactivate
3030
```
31-
`
31+
3232
Once you've created your virtual environment, you can install `timm` in it.
3333

3434
## Using pip

timm/layers/patch_dropout.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
class PatchDropout(nn.Module):
88
"""
9-
https://arxiv.org/abs/2212.00794
9+
https://arxiv.org/abs/2212.00794 and https://arxiv.org/pdf/2208.07220
1010
"""
1111
return_indices: torch.jit.Final[bool]
1212

timm/models/_builder.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@
1010
from timm.models._features import FeatureListNet, FeatureDictNet, FeatureHookNet, FeatureGetterNet
1111
from timm.models._features_fx import FeatureGraphNet
1212
from timm.models._helpers import load_state_dict
13-
from timm.models._hub import has_hf_hub, download_cached_file, check_cached_file, load_state_dict_from_hf
13+
from timm.models._hub import has_hf_hub, download_cached_file, check_cached_file, load_state_dict_from_hf,\
14+
load_custom_from_hf
1415
from timm.models._manipulate import adapt_input_conv
1516
from timm.models._pretrained import PretrainedCfg
1617
from timm.models._prune import adapt_model_from_file
@@ -185,7 +186,12 @@ def load_pretrained(
185186
elif load_from == 'hf-hub':
186187
_logger.info(f'Loading pretrained weights from Hugging Face hub ({pretrained_loc})')
187188
if isinstance(pretrained_loc, (list, tuple)):
188-
state_dict = load_state_dict_from_hf(*pretrained_loc)
189+
custom_load = pretrained_cfg.get('custom_load', False)
190+
if isinstance(custom_load, str) and custom_load == 'hf':
191+
load_custom_from_hf(*pretrained_loc, model)
192+
return
193+
else:
194+
state_dict = load_state_dict_from_hf(*pretrained_loc)
189195
else:
190196
state_dict = load_state_dict_from_hf(pretrained_loc)
191197
else:

timm/models/_hub.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,13 @@ def load_state_dict_from_hf(model_id: str, filename: str = HF_WEIGHTS_NAME):
190190
return torch.load(cached_file, map_location='cpu')
191191

192192

193+
def load_custom_from_hf(model_id: str, filename: str, model: torch.nn.Module):
194+
assert has_hf_hub(True)
195+
hf_model_id, hf_revision = hf_split(model_id)
196+
cached_file = hf_hub_download(hf_model_id, filename=filename, revision=hf_revision)
197+
return model.load_pretrained(cached_file)
198+
199+
193200
def save_config_for_hf(
194201
model,
195202
config_path: str,

timm/models/mobilenetv3.py

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -622,43 +622,6 @@ def _gen_lcnet(variant: str, channel_multiplier: float = 1.0, pretrained: bool =
622622
return model
623623

624624

625-
def _gen_lcnet(variant: str, channel_multiplier: float = 1.0, pretrained: bool = False, **kwargs):
626-
""" LCNet
627-
Essentially a MobileNet-V3 crossed with a MobileNet-V1
628-
629-
Paper: `PP-LCNet: A Lightweight CPU Convolutional Neural Network` - https://arxiv.org/abs/2109.15099
630-
631-
Args:
632-
channel_multiplier: multiplier to number of channels per layer.
633-
"""
634-
arch_def = [
635-
# stage 0, 112x112 in
636-
['dsa_r1_k3_s1_c32'],
637-
# stage 1, 112x112 in
638-
['dsa_r2_k3_s2_c64'],
639-
# stage 2, 56x56 in
640-
['dsa_r2_k3_s2_c128'],
641-
# stage 3, 28x28 in
642-
['dsa_r1_k3_s2_c256', 'dsa_r1_k5_s1_c256'],
643-
# stage 4, 14x14in
644-
['dsa_r4_k5_s1_c256'],
645-
# stage 5, 14x14in
646-
['dsa_r2_k5_s2_c512_se0.25'],
647-
# 7x7
648-
]
649-
model_kwargs = dict(
650-
block_args=decode_arch_def(arch_def),
651-
stem_size=16,
652-
round_chs_fn=partial(round_channels, multiplier=channel_multiplier),
653-
norm_layer=partial(nn.BatchNorm2d, **resolve_bn_args(kwargs)),
654-
act_layer=resolve_act_layer(kwargs, 'hard_swish'),
655-
se_layer=partial(SqueezeExcite, gate_layer='hard_sigmoid', force_act_layer=nn.ReLU),
656-
num_features=1280,
657-
**kwargs,
658-
)
659-
model = _create_mnv3(variant, pretrained, **model_kwargs)
660-
return model
661-
662625

663626
def _cfg(url: str = '', **kwargs):
664627
return {

0 commit comments

Comments
 (0)