T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

This repository contains an unofficial implementation of T-Rex2. Currently, only the visual encoder has been implemented.

Deepwiki docs: https://deepwiki.com/newocean-group/T-Rex2.

📖 Model Architecture:

📃Datasets are used for training the model:

Object365
OpenImagesV7
CrowdHuman
Hiertext
LVIS

To train the model without text prompts and with a batch size of 1 due to hardware limitations. I use the following training process:

🖼️ Visual Results:

Note : This model has been trained for approximately 2.7M steps (batch size = 1) and is still in the training process.

⚙️ Installation

To use the model, follow these steps:

Clone the repository:

git clone https://github.com/newocean-group/T-Rex2.git

Download and install CUDA toolkit:

# Make sure you have the correct version installed. For example, I installed CUDA 11.8

Compiling CUDA operators:
```
cd ops
python setup.py install
```
Install other dependencies:
```
pip install -r requirements.txt
```
Log in to your HuggingFace account on your device to automatically download the model weights using the following command:
```
huggingface-cli login
Enter your token
```

🔍 Demo

I have attached a .ipynb file in the repository. You can refer to it to know how to use the model.

Additionally, I have provided another .ipynb file that illustrates the process of learning class embeddings for the model.

Note: You may need to adjust the threshold value to achieve the best results.

💡 Conclusion

This model has been implemented based on my current knowledge and can be further improved with future research.

Additionally, the model can be modified for instance segmentation based on the approach described in this paper. The modified model architecture would resemble the following:

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
SA_1B		SA_1B
assets		assets
ops		ops
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cls_embeddings.ipynb		cls_embeddings.ipynb
demo.ipynb		demo.ipynb
hf.py		hf.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

📖 Model Architecture:

📃Datasets are used for training the model:

To train the model without text prompts and with a batch size of 1 due to hardware limitations. I use the following training process:

🖼️ Visual Results:

⚙️ Installation

🔍 Demo

💡 Conclusion

References

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

newocean-group/T-Rex2

Folders and files

Latest commit

History

Repository files navigation

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

📖 Model Architecture:

📃Datasets are used for training the model:

To train the model without text prompts and with a batch size of 1 due to hardware limitations. I use the following training process:

🖼️ Visual Results:

⚙️ Installation

🔍 Demo

💡 Conclusion

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages