VisText-Mosquito: A Multimodal Dataset for Mosquito Breeding Site Detection, Surface Segmentation, and Reasoning

Dataset Overview

VisText-Mosquito is a comprehensive multimodal dataset designed to support detecting mosquito breeding sites, segmentation of water surfaces, and generating natural language reasoning for explainable AI applications. It consists of three core components:

Breeding Place Detection: This part includes 1,828 images with 3,752 annotations across five classes: Coconut-Exocarp, Vase, Tire, Drain-Inlet, and Bottle. The images were collected from diverse urban, semi-urban, and rural environments in Bangladesh under daylight conditions to ensure visual consistency. Detection performance was validated using state-of-the-art object detection models, including YOLOv5s, YOLOv8n, and YOLOv9s, with YOLOv9s achieving the highest mAP@50.
Water Surface Segmentation: This component contains 142 images with 253 annotations across two classes: Vase with Water and Tire with Water. YOLOv8x-Seg and YOLOv11n-Seg models were used to validate segmentation performance in detecting water surfaces within the identified containers.
Textual Reasoning Generation: Each image is linked with a natural language reasoning statement that explains the presence or absence of breeding risk. A fine-tuned BLIP model was used to generate these explanations, achieving strong performance on BLEU, BERTScore, and ROUGE-L metrics.

The VisText-Mosquito dataset offers a novel multimodal benchmark for training and evaluating AI models that combine detection, segmentation, and interpretability. It serves as a valuable resource for researchers and public health professionals aiming to develop explainable, scalable mosquito control solutions.

Code

The notebook called yolov5s_yolov8n_yolov9s_1.ipynb is used to train the models YOLOv5s, YOLOv8n, and YOLOv9s for mosquito breeding place detection. And the notebook called Yolov8x-seg.ipynb is used to train the models YOLOv8x-seg for surface water segmentation.

Model Weights

The weight for object detection models are - YOLOv5s, YOLOv8n, and YOLOv9s. The weight for segmentation model YOLOv8x-Seg.

Cite

If you use the dataset for your research, please cite it as follows:

@article{islam2025vistext,
  title={VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning},
  author={Islam, Md Adnanul and Sayeedi, Md Faiyaz Abdullah and Shuvo, Md Asaduzzaman and Rahman, Muhammad Ziaur and Bappy, Shahanur Rahman and Rahman, Raiyan and Shatabda, Swakkhar},
  journal={arXiv preprint arXiv:2506.14629},
  year={2025}
}

Contact

For inquiries or feedback, feel free to contact us at msayeedi212049@bscse.uiu.ac.bd, mislam221096@bscse.uiu.ac.bd

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
Assets		Assets
Code		Code
Result		Result
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisText-Mosquito: A Multimodal Dataset for Mosquito Breeding Site Detection, Surface Segmentation, and Reasoning

Dataset Overview

Code

Model Weights

Cite

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

adnanul-islam-jisun/VisText-Mosquito

Folders and files

Latest commit

History

Repository files navigation

VisText-Mosquito: A Multimodal Dataset for Mosquito Breeding Site Detection, Surface Segmentation, and Reasoning

Dataset Overview

Code

Model Weights

Cite

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages