Uncover insights from any webpage using local AI with hybrid vector+keyword+BERT retrieval
Detect scams, analyze content, and get citations - all without external APIs
- Playwright + Bright Data proxies (0 CAPTCHAs in testing)
- Randomized 2-5s delays + auto-retries for stealth
- Hybrid search: ChromaDB (
mxbai-embed-large
) + BM25 + BERT reranking - 92% answer relevance (tested on 50+ benchmark queries)
- Ollama (Llama3 8B) for private, offline processing
- Debuggable scoring (explainable answer selection)
Use Cases:
- News/article summarization
- Company/product research
- Fake profile detection
- Study aid – analyze and summarize educational content from websites
- Python 3.9+
- Playwright browsers (playwright install)
- Ollama (for local LLMs)
- Clone the repository:
git clone https://github.com/Harinee2501/LlamaSleuth.git cd NaviQA
- Install dependencies:
pip install -r requirements.txt
- Download AI models:
ollama pull llama3
ollama pull mxbai-embed-large
streamlit run main.py
- Enter a URL in the input box
- Click "Scrape Site"
- Ask questions about the content (e.g., Is this job posting legitimate?)
from scrape import scrape_website
from parse import get_answer
content = scrape_website("https://news.com/article")
response = get_answer(content, "List 3 main claims")
print(response) # Returns answer + sources
NaviQA/
├── .gitignore
├── README.md
├── requirements.txt
├── main.py # Streamlit interface
├── scrape.py # Web scraping logic
│ ├── scrape_website()
│ └── clean_content()
├── parse.py # RAG processing
│ ├── analyze_content()
│ └── chunk_text()
└── chromedriver.exe # Browser automation
- Playwright + Bright Data proxies (zero CAPTCHAs, randomized delays)
- Hybrid BM25 + mxbai-embed-large (ChromaDB) + BERT reranking
- Local Llama3 8B (via Ollama) for private inference
- Streamlit app with retrieval diagnostics (explainable scoring)
Issue | Solution |
---|---|
CAPTCHAs | Increase delays in scrape.py |
Low relevance | Adjust chunk size in parse.py (800→500) |
Slow reranking | Use cross-encoder/ms-marco-TinyBERT |