-
Notifications
You must be signed in to change notification settings - Fork 2
CLIP: Input image size (490x490) doesn't match model (336x336). #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I read the issue #4 and notice the same error. Both model and config are downloaded from HF, but this error still appears. |
Hi, @Lv996331209 . 490 is indeed correct resolution we used in ChartMoE. Probably, you can check the version of core python packages like Thanks for you attention! If any question exists (like the version of some python package?), I'm very willing to communicate with you~! |
@Lv996331209 . Hi, can you provide the versions of packages mentioned in |
@Lv996331209 . Hi, you can try using the same version of transformers as |
@Coobiw Hi, thanks for your help! It was the wrong version of transformers package, due to my unintentional upgrade. After I reverted the version as requirements.txt, the error was resolved. That was good! Thank you so much for your suggestion! BTW, looking forward to your dataset. :) |
Thanks for your kind words. I will add this problem into FAQs. Whole training pipeline codes and dataset are coming! I am organizing these contents. |
Hi,
There is an error when I run quickstart.py, which as following:
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chartmoe.py", line 122, in encode_img
img_embeds, atts_img, img_target = self.img2emb(image)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chartmoe.py", line 126, in img2emb
img_embeds = self.vision_proj(self.vit(image.to(self.device)))
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/build_mlp.py", line 133, in forward
image_forward_outs = self.vision_tower(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1171, in forward
return self.vision_model(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1094, in forward
hidden_states = self.embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 244, in forward
raise ValueError(
ValueError: Input image size (490490) doesn't match model (336336).
It seems like that the img_size is 490(from config.json) but the size of clip is 336(clip-vit-large-patch14-336)。Where did I operate incorrectly? Thank you so much!
The text was updated successfully, but these errors were encountered: