VisText-Mosquito: A Multimodal Dataset for Mosquito Breeding Site Detection, Surface Segmentation, and Reasoning
VisText-Mosquito is a comprehensive multimodal dataset designed to support detecting mosquito breeding sites, segmentation of water surfaces, and generating natural language reasoning for explainable AI applications. It consists of three core components:
-
Breeding Place Detection: This part includes 1,828 images with 3,752 annotations across five classes: Coconut-Exocarp, Vase, Tire, Drain-Inlet, and Bottle. The images were collected from diverse urban, semi-urban, and rural environments in Bangladesh under daylight conditions to ensure visual consistency. Detection performance was validated using state-of-the-art object detection models, including YOLOv5s, YOLOv8n, and YOLOv9s, with YOLOv9s achieving the highest mAP@50.
-
Water Surface Segmentation: This component contains 142 images with 253 annotations across two classes: Vase with Water and Tire with Water. YOLOv8x-Seg and YOLOv11n-Seg models were used to validate segmentation performance in detecting water surfaces within the identified containers.
-
Textual Reasoning Generation: Each image is linked with a natural language reasoning statement that explains the presence or absence of breeding risk. A fine-tuned BLIP model was used to generate these explanations, achieving strong performance on BLEU, BERTScore, and ROUGE-L metrics.
The VisText-Mosquito dataset offers a novel multimodal benchmark for training and evaluating AI models that combine detection, segmentation, and interpretability. It serves as a valuable resource for researchers and public health professionals aiming to develop explainable, scalable mosquito control solutions.
The notebook called yolov5s_yolov8n_yolov9s_1.ipynb is used to train the models YOLOv5s, YOLOv8n, and YOLOv9s for mosquito breeding place detection. And the notebook called Yolov8x-seg.ipynb is used to train the models YOLOv8x-seg for surface water segmentation.
The weight for object detection models are - YOLOv5s, YOLOv8n, and YOLOv9s. The weight for segmentation model YOLOv8x-Seg.
If you use the dataset for your research, please cite it as follows:
@article{islam2025vistext, title={VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning}, author={Islam, Md Adnanul and Sayeedi, Md Faiyaz Abdullah and Shuvo, Md Asaduzzaman and Rahman, Muhammad Ziaur and Bappy, Shahanur Rahman and Rahman, Raiyan and Shatabda, Swakkhar}, journal={arXiv preprint arXiv:2506.14629}, year={2025} }
For inquiries or feedback, feel free to contact us at msayeedi212049@bscse.uiu.ac.bd, mislam221096@bscse.uiu.ac.bd