🎓 Elarova 2.0 – Multimodal Medical Virtual Learning Chatbot (Medicine Student ) (This the version 2 of Elarova)
Elarova is an intelligent, voice-interactive Multimodal Medical Virtual Learning Chatbot (Medicine Student ) (This the version 2 of Elarova) that helps students explore and understand visual academic content like diagrams, charts, handwritten notes, or Virtual Learning papers. Just speak your query, upload an image, and Elarova will answer both visually and audibly.
- Multimodal Model: meta-llama/llama-4-scout-17b-16e-instruct via Groq API
- Voice Recognition: Whisper
- TTS Engines: Google gTTS & ElevenLabs
Groq API – Ultra-fast LLM API LLaMA-4 Vision Model – meta-llama/llama-4-scout-17b-16e-instruct Whisper – For speech recognition gTTS & ElevenLabs – For voice output Gradio – For building the web interface
- 🎙️ Voice Input: Speak your Virtual Learning question naturally.
- 🧠 Multimodal AI: Combines your voice query with an uploaded image to give smart, context-aware answers.
- 🖼️ Image Understanding: Upload diagrams, charts, handwritten pages, or screenshots — Elarova understands them.
- 💬 LLM-Powered Responses: Powered by
meta-llama/llama-4-scout-17b-16e-instruct
via Groq API. - 🔊 Dual TTS Engines: Replies are spoken aloud using both gTTS and ElevenLabs.
- 🌐 Gradio Web Interface: Clean, easy-to-use interface accessible from your browser.
Elarova/ ├── gradio_app.py # Main app with Gradio interface ├── brain_of_the_Elarova.py # Core logic for image + query processing ├── .env ├── voice_of_the_doctor.py └── voice_of_the_user.py
git clone https://github.com/iamafridi/Elarova2.0.git
cd Elarova2.0
GROQ_API_KEY=your_groq_api_key ELEVENLABS_API_KEY=your_elevenlabs_api_key
python gradio_app.py
Upload a diagram and ask:
🗣️ "Explain this process in simple terms."
📢 Elarova will generate a voice and text response explaining the diagram based on your question.
📜 License MIT License
Afridi Akbar Ifty GitHub: https://github.com/iamafridi Portfolio : https://iamafrididev.netlify.app LinkedIn: your-linkedin-profile