Question about the pre-alignment trainging strategy.

Hello. This is a very good work and a comprehensive analysis of how to design a hybrid visual encoder, gave me a lot of inspiration. At the same time, after reading the paper, I have some questions about the pre-alignment training: 1.What is the difference between pre alignment training in stage one and pre training in stage two? Considering that both types of training unfreeze visual encoders and projectors. The loss functions of the two stages are also commonly used regression loss, that is, next-token-prediction supervision. 2. For pre-alignment training, why introduce additional smaller LLMs (Vicuna-7B in practice) for alignment instead of the original LLMs of the model, because they are all using 7B models. The meaning of this step is not very clear to me. Looking forward to your reply！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about the pre-alignment trainging strategy. #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the pre-alignment trainging strategy. #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions