This project explores Dynamic Bloom Filters (DBFs) and the fine-tuning of SentenceTransformer models for optimizing semantic search in large-scale document retrieval systems. It focuses on improving scalability, precision, memory efficiency, and query performance in dynamic and real-time workloads.
- Dynamic Bloom Filters: Efficiently handle dynamic workloads by dynamically resizing to manage variable data efficiently.
- Semantic Search: Fine-tunes the
all-MiniLM-L6-v2
SentenceTransformer model for high-precision semantic search. - Interactive Querying: Provides real-time query capability with ranked retrieval based on cosine similarity.
- Performance Evaluation: Uses metrics like cache hit rate, false positive rate, Precision@k, Recall@k, and F1-score.
- Python 3.8 or above
pip
package manager- GPU-enabled environment for faster training and inference (optional)
- Clone the repository:
git clone https://github.com/joymohanty8999/intelligent-query-optimization.git cd intelligent-query-optimization
- Set up a virtual environment (optional but recommended):
python -m venv venv
- Install dependencies:
pip install -r requirements.txt
- Download the MS MARCO dataset and place it in the appropriate directory.
To run the full suite of experiments:
python main.py
To test the semantic search with interactive queries:
python interactive_query.py