This is a simple but powerful semantic search engine built using SentenceTransformers and Gradio. It allows users to ask questions and receive the top 3 most relevant answers from a predefined Q&A dataset using embedding-based similarity.
- Uses
intfloat/e5-small-v2
transformer model - Supports domain-specific question-answer search
- Logs every user query and matched results to CSV
- Saves sentence embeddings to avoid reprocessing
- Simple Gradio-based web interface
semantic-search-app/ ├── app.py # Main entry point (Gradio UI) ├── model_utils.py # Model and embedding handling ├── search_engine.py # Core semantic search logic ├── logger.py # Query logging utility ├── data/ │ └── qa_dataset.csv # Input dataset (questions and answers) ├── embeddings/ │ └── corpus_embeddings.pt # Cached embeddings (auto-created) ├── logs/ │ └── query_log.csv # Logs of user queries ├── requirements.txt # Required Python libraries ├── .gitignore # Ignored folders/files └── README.md # This file
Place your Q&A dataset in data/qa_dataset.csv
with the following columns:
Question | Answer |
---|---|
How to return? | Visit our returns page |
... | ... |
git clone https://github.com/rehan-shafi/semantic-search-app.git cd semantic-search-app
pip install -r requirements.txt
Place your dataset in: data/qa_dataset.csv
python app.py Open the provided URL to interact with the semantic Q&A system.
Model: intfloat/e5-small-v2 Format: Adds query: and passage: prefixes as per model guidelines Similarity: Uses cosine similarity on sentence embeddings Results: Shows top 3 answers above a similarity threshold (default 0.5)
User Query: How can I return a damaged product Top Match: Q: How do I return my product? A: Please visit the returns page and fill out the return form. Score: 0.78
This project is released under the MIT License.
Model: SentenceTransformers UI: Gradio Author: Rehan Shafi