Skip to content

Elarova — A smart, multimodal research assistant designed to help students by combining speech, text, and other input modes for efficient academic research and study support. Powered by state-of-the-art speech recognition, text-to-speech, and AI models, including meta-llama/llama-4-scout-17b-16e-instruct, with an easy-to-use Gradio web interface.

Notifications You must be signed in to change notification settings

iamafridi/elarova-2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎓 Elarova 2.0 – Multimodal Medical Virtual Learning Chatbot (Medicine Student ) (This the version 2 of Elarova)

Elarova is an intelligent, voice-interactive Multimodal Medical Virtual Learning Chatbot (Medicine Student ) (This the version 2 of Elarova) that helps students explore and understand visual academic content like diagrams, charts, handwritten notes, or Virtual Learning papers. Just speak your query, upload an image, and Elarova will answer both visually and audibly.

📸 Demo image


🧠 Model Used

  • Multimodal Model: meta-llama/llama-4-scout-17b-16e-instruct via Groq API
  • Voice Recognition: Whisper
  • TTS Engines: Google gTTS & ElevenLabs

Tech Stack

Groq API – Ultra-fast LLM API LLaMA-4 Vision Model – meta-llama/llama-4-scout-17b-16e-instruct Whisper – For speech recognition gTTS & ElevenLabs – For voice output Gradio – For building the web interface

🔍 Features

  • 🎙️ Voice Input: Speak your Virtual Learning question naturally.
  • 🧠 Multimodal AI: Combines your voice query with an uploaded image to give smart, context-aware answers.
  • 🖼️ Image Understanding: Upload diagrams, charts, handwritten pages, or screenshots — Elarova understands them.
  • 💬 LLM-Powered Responses: Powered by meta-llama/llama-4-scout-17b-16e-instruct via Groq API.
  • 🔊 Dual TTS Engines: Replies are spoken aloud using both gTTS and ElevenLabs.
  • 🌐 Gradio Web Interface: Clean, easy-to-use interface accessible from your browser.

📁 Project Structure

Elarova/ ├── gradio_app.py # Main app with Gradio interface ├── brain_of_the_Elarova.py # Core logic for image + query processing ├── .env ├── voice_of_the_doctor.py └── voice_of_the_user.py


⚙️ Setup Instructions

1. Clone the Repo

git clone https://github.com/iamafridi/Elarova2.0.git
cd Elarova2.0

Set Environment Variables

GROQ_API_KEY=your_groq_api_key ELEVENLABS_API_KEY=your_elevenlabs_api_key

Run the App

python gradio_app.py

🎓 Example Use Case

Upload a diagram and ask:

🗣️ "Explain this process in simple terms."

📢 Elarova will generate a voice and text response explaining the diagram based on your question.

📜 License MIT License

👤 Author

Afridi Akbar Ifty GitHub: https://github.com/iamafridi Portfolio : https://iamafrididev.netlify.app LinkedIn: your-linkedin-profile

About

Elarova — A smart, multimodal research assistant designed to help students by combining speech, text, and other input modes for efficient academic research and study support. Powered by state-of-the-art speech recognition, text-to-speech, and AI models, including meta-llama/llama-4-scout-17b-16e-instruct, with an easy-to-use Gradio web interface.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published