Skip to content

CLIP: Input image size (490x490) doesn't match model (336x336). #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Lv996331209 opened this issue Feb 7, 2025 · 6 comments
Closed
Labels
bug Something isn't working

Comments

@Lv996331209
Copy link

Hi,
There is an error when I run quickstart.py, which as following:
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chartmoe.py", line 122, in encode_img
img_embeds, atts_img, img_target = self.img2emb(image)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chartmoe.py", line 126, in img2emb
img_embeds = self.vision_proj(self.vit(image.to(self.device)))
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/build_mlp.py", line 133, in forward
image_forward_outs = self.vision_tower(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1171, in forward
return self.vision_model(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1094, in forward
hidden_states = self.embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 244, in forward
raise ValueError(
ValueError: Input image size (490
490) doesn't match model (336
336).

It seems like that the img_size is 490(from config.json) but the size of clip is 336(clip-vit-large-patch14-336)。Where did I operate incorrectly? Thank you so much!

@Lv996331209
Copy link
Author

I read the issue #4 and notice the same error.

Both model and config are downloaded from HF, but this error still appears.

@Coobiw
Copy link
Collaborator

Coobiw commented Feb 7, 2025

Hi, @Lv996331209 . 490 is indeed correct resolution we used in ChartMoE. Probably, you can check the version of core python packages like transformers according to requirements.txt. If you can give me more details to reproduce your bug, that would be really better~. I've never encountered this problem. But #4 has been solved. You can also discuss it with the author.

Thanks for you attention! If any question exists (like the version of some python package?), I'm very willing to communicate with you~!

@Coobiw
Copy link
Collaborator

Coobiw commented Feb 7, 2025

@Lv996331209 . Hi, can you provide the versions of packages mentioned in requirements.txt. I will try to reproduce this bug~. Thanks!

@Coobiw
Copy link
Collaborator

Coobiw commented Feb 9, 2025

@Lv996331209 . Hi, you can try using the same version of transformers as requirements.txt. I think it will help you~

@Lv996331209
Copy link
Author

@Coobiw Hi, thanks for your help! It was the wrong version of transformers package, due to my unintentional upgrade. After I reverted the version as requirements.txt, the error was resolved. That was good!

Thank you so much for your suggestion! BTW, looking forward to your dataset. :)

@Coobiw
Copy link
Collaborator

Coobiw commented Feb 10, 2025

Thanks for your kind words. I will add this problem into FAQs. Whole training pipeline codes and dataset are coming! I am organizing these contents.

@Coobiw Coobiw closed this as completed Feb 10, 2025
@Coobiw Coobiw changed the title Infer error about clip CLIP: Input image size (490x490) doesn't match model (336x336). Feb 10, 2025
@Coobiw Coobiw added the bug Something isn't working label Feb 10, 2025
@Coobiw Coobiw pinned this issue Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants