CLIP: Input image size (490x490) doesn't match model (336x336). #6

Lv996331209 · 2025-02-07T05:46:24Z

Hi,
There is an error when I run quickstart.py, which as following:
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chartmoe.py", line 122, in encode_img
img_embeds, atts_img, img_target = self.img2emb(image)
File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chartmoe.py", line 126, in img2emb
img_embeds = self.vision_proj(self.vit(image.to(self.device)))
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/model/build_mlp.py", line 133, in forward
image_forward_outs = self.vision_tower(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1171, in forward
return self.vision_model(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1094, in forward
hidden_states = self.embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 244, in forward
raise ValueError(
ValueError: Input image size (490490) doesn't match model (336336).

It seems like that the img_size is 490(from config.json) but the size of clip is 336(clip-vit-large-patch14-336)。Where did I operate incorrectly? Thank you so much!

Lv996331209 · 2025-02-07T07:50:20Z

I read the issue #4 and notice the same error.

Both model and config are downloaded from HF, but this error still appears.

Coobiw · 2025-02-07T09:33:01Z

Hi, @Lv996331209 . 490 is indeed correct resolution we used in ChartMoE. Probably, you can check the version of core python packages like transformers according to requirements.txt. If you can give me more details to reproduce your bug, that would be really better~. I've never encountered this problem. But #4 has been solved. You can also discuss it with the author.

Thanks for you attention! If any question exists (like the version of some python package?), I'm very willing to communicate with you～!

Coobiw · 2025-02-07T09:48:52Z

@Lv996331209 . Hi, can you provide the versions of packages mentioned in requirements.txt. I will try to reproduce this bug~. Thanks!

Coobiw · 2025-02-09T16:16:23Z

@Lv996331209 . Hi, you can try using the same version of transformers as requirements.txt. I think it will help you~

Lv996331209 · 2025-02-10T05:41:19Z

@Coobiw Hi, thanks for your help! It was the wrong version of transformers package, due to my unintentional upgrade. After I reverted the version as requirements.txt, the error was resolved. That was good!

Thank you so much for your suggestion! BTW, looking forward to your dataset. :)

Coobiw · 2025-02-10T08:02:58Z

Thanks for your kind words. I will add this problem into FAQs. Whole training pipeline codes and dataset are coming! I am organizing these contents.

Coobiw mentioned this issue Feb 7, 2025

ChartBench evaluation #4

Closed

Coobiw closed this as completed Feb 10, 2025

Coobiw changed the title ~~Infer error about clip~~ CLIP: Input image size (490x490) doesn't match model (336x336). Feb 10, 2025

Coobiw added the bug Something isn't working label Feb 10, 2025

Coobiw pinned this issue Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLIP: Input image size (490x490) doesn't match model (336x336). #6

CLIP: Input image size (490x490) doesn't match model (336x336). #6

Lv996331209 commented Feb 7, 2025

Lv996331209 commented Feb 7, 2025

Uh oh!

Coobiw commented Feb 7, 2025 •

edited

Loading

Uh oh!

Coobiw commented Feb 7, 2025

Uh oh!

Coobiw commented Feb 9, 2025

Uh oh!

Lv996331209 commented Feb 10, 2025

Uh oh!

Coobiw commented Feb 10, 2025

Uh oh!

CLIP: Input image size (490x490) doesn't match model (336x336). #6

CLIP: Input image size (490x490) doesn't match model (336x336). #6

Comments

Lv996331209 commented Feb 7, 2025

Lv996331209 commented Feb 7, 2025

Uh oh!

Coobiw commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Coobiw commented Feb 7, 2025

Uh oh!

Coobiw commented Feb 9, 2025

Uh oh!

Lv996331209 commented Feb 10, 2025

Uh oh!

Coobiw commented Feb 10, 2025

Uh oh!

Coobiw commented Feb 7, 2025 •

edited

Loading