BradyFU · tingyu215 · Nov 27, 2024
diff --git a/README.md b/README.md
@@ -640,3 +640,4 @@ This is the first work to correct hallucination in multimodal large language mod
 | **Fruit-ATVC** | [Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation](https://arxiv.org/pdf/2303.05983.pdf) | [Link](https://drive.google.com/drive/folders/1Saaia2rRRb1nz5sKdmpzYdS4jHiMDaP0) | A manually pictured multimodal fine-tuning dataset for learning to reject instructions |
 | **InfoSeek** | [Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?](https://arxiv.org/pdf/2302.11713.pdf) | [Link](https://open-vision-language.github.io/infoseek/) | A VQA dataset that focuses on asking information-seeking questions |
 | **OVEN** | [Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities](https://arxiv.org/pdf/2302.11154.pdf) | [Link](https://open-vision-language.github.io/oven/) | A dataset that focuses on recognizing the Visual Entity on the Wikipedia, from images in the wild |
+| **TS-LLaVA** | [TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models](https://arxiv.org/pdf/2411.11066) | [Link](https://github.com/tingyu215/TS-LLaVA) | A training-free baseline for video LLM |