Skip to content

UPN in three-stage training process #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gongchenting opened this issue Apr 16, 2025 · 4 comments
Open

UPN in three-stage training process #18

gongchenting opened this issue Apr 16, 2025 · 4 comments

Comments

@gongchenting
Copy link

gongchenting commented Apr 16, 2025

In the three-stage training process of ChatRex, will the parameters of UPN be updated? Additionally, how should the GT boxes be mixed with the UPN boxes? Will the boxes from UPN be filtered first, just like in the testing examples, before being mixed with the GT boxes.

@Mountchicken
Copy link
Collaborator

Hi @gongchenting
UPN is not trained together with the MLLM; it is trained separately and is only used to provide proposal boxes. The GT boxes are directly mixed with the UPN-generated boxes, and we remove any UPN boxes whose IoU with GT boxes is greater than 0.9.

@gongchenting
Copy link
Author

Hi @gongchenting UPN is not trained together with the MLLM; it is trained separately and is only used to provide proposal boxes. The GT boxes are directly mixed with the UPN-generated boxes, and we remove any UPN boxes whose IoU with GT boxes is greater than 0.9.

Thank you for your reply. Here a few more questions I would like to confirm, and I look forward to your further response.

  1. After removing the UPN boxes that have an IOU greater than 0.9 with the GT boxes, will the remaining UPN boxes undergo a second filtering based on the 0.3 threshold? Or will they be selected based on scores until the total number of boxes reaches 100?

  2. After mixing the GT boxes and UPN boxes, when constructing the training inputs, will the GT boxes and UPN boxes be mixed and shuffled to generate the corresponding object tokens?

@Mountchicken
Copy link
Collaborator

  1. The process is as follows: given an image, we first pass it through the UPN and select boxes with a score greater than 0.3. These selected UPN boxes are then combined with the ground truth (GT) boxes. If the total number of boxes after merging exceeds 100, we remove some of the UPN boxes so that the final total (UPN + GT) is 100. If the total number of boxes is less than or equal to 100, we make no changes.

  2. After merging the UPN boxes with the GT boxes, we randomly shuffle the combined boxes and update the object indices accordingly.

@gongchenting
Copy link
Author

Thank you very much! It was very helpful for me to understand ChatRex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants