This repository contains an unofficial implementation of T-Rex2. Currently, only the visual encoder has been implemented.
Deepwiki docs: https://deepwiki.com/newocean-group/T-Rex2.
- Object365
- OpenImagesV7
- CrowdHuman
- Hiertext
- LVIS
To train the model without text prompts and with a batch size of 1 due to hardware limitations. I use the following training process:
Note : This model has been trained for approximately 2.7M steps (batch size = 1) and is still in the training process.
To use the model, follow these steps:
-
Clone the repository:
git clone https://github.com/newocean-group/T-Rex2.git
-
Download and install CUDA toolkit:
# Make sure you have the correct version installed. For example, I installed CUDA 11.8
-
Compiling CUDA operators:
cd ops python setup.py install
-
Install other dependencies:
pip install -r requirements.txt
-
Log in to your HuggingFace account on your device to automatically download the model weights using the following command:
huggingface-cli login Enter your token
I have attached a .ipynb file in the repository. You can refer to it to know how to use the model.
Additionally, I have provided another .ipynb file that illustrates the process of learning class embeddings for the model.
Note: You may need to adjust the threshold value to achieve the best results.
This model has been implemented based on my current knowledge and can be further improved with future research.
Additionally, the model can be modified for instance segmentation based on the approach described in this paper. The modified model architecture would resemble the following: