Women often turn to a wide range of blogs,forums and websites to seek answers pertaining to their health which might lead them to inconsistent or unreliable information.A RAG-based app serves as a one-stop solution by delivering trusted, accurate and easy to understand answers from verified sources—all in one place. This also provides a safe space for asking their sensitive queries without any fear or judgement
To ensure a sustainable and efficient deployment, the system compares a Small Language Model (SLM) with a quantized Mistral-7B GPTQ (Quantized Generative Pre-trained Transformer) model, assessing their trade-offs in latency, energy consumption, and overall environmental impact. This comparison identifies which model offers a greener footprint—minimizing power usage and carbon emissions—while maintaining high performance, making it ideal for resource-constrained or edge environments.The system also compare re-ranking methods in the RAG system using bi-encoder and cross-encoder approaches as to which performs better to the pre-text of models that we have choosen.
I. Microsoft phi-2 performance metrics
II. Inference Energy Metrics of Microsoft Phi-2
III. Mistral 7B performance metrics
IV. Inference Energy Metrics of Mistral 7B