Skip to content

ElaMath is a smart, voice-enabled math assistant that helps students solve and understand math problems using both spoken questions and images. It’s powered by the powerful multimodal meta-llama/llama-4-scout-17b-16e-instruct model via Groq API, combined with Whisper for speech recognition and ElevenLabs/gTTS for natural voice responses.

Notifications You must be signed in to change notification settings

iamafridi/elaMath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎓 ElaMath – Visual & Vocal Math Assistant

ElaMath is an intelligent, voice-enabled multimodal math assistant designed to help students solve and understand math problems. Simply speak your math question and optionally upload an image—such as a diagram, handwritten notes, or textbook screenshot—and ElaMath will provide clear, step-by-step explanations both in text and spoken audio

📸 Demo

image


🧠 AI Models Used

  • Multimodal LLM: meta-llama/llama-4-scout-17b-16e-instruct via Groq API
  • Speech Recognition: Whisper (OpenAI)
  • Text-to-Speech: gTTS & ElevenLabs

🛠️ Tech Stack

  • Groq API – High-speed inference for LLaMA models
  • 🧠 Meta LLaMA-4 Vision Model – Multimodal math question answering
  • 🎙️ Whisper – Converts voice to text
  • 🔊 gTTS & ElevenLabs – Generate natural audio output
  • 🧪 Gradio – Fast web interface for interaction

🔍 Key Features

  • 🎙️ Voice Input: Speak your math question naturally
  • 🖼️ Image Analysis: Upload diagrams, equations, or handwritten problems
  • 🧠 Multimodal Reasoning: Combines voice + image to give intelligent, relevant answers
  • 💬 Text + Voice Output: See and hear the explanation clearly
  • 🌐 Web Interface: Lightweight, user-friendly frontend powered by Gradio

📁 Project Structure

ElaMath/ ├── gradio_app.py # Main Gradio interface ├── brain_of_the_elaMath.py # Handles multimodal LLM image + text analysis ├── voice_of_the_user.py # Speech-to-text logic using Whisper ├── voice_of_the_math_instructor.py # Text-to-speech logic using ElevenLabs/gTTS ├── .env # API keys stored securely here


⚙️ Setup Instructions

1. Clone the Repository

git clone https://github.com/iamafridi/elaMath.git cd elaMath

2. Add Environment Variables

Create a .env file and add the following:

GROQ_API_KEY=your_groq_api_key ELEVENLABS_API_KEY=your_elevenlabs_api_key

  1. Run the Application python gradio_app.py

🧪 Example Use Case

Upload an image of a geometry problem and ask:

🗣️ “What’s the area of this triangle?”

📢 ElaMath will analyze the image and your voice, then respond with a step-by-step explanation in both text and audio.


📜 License

MIT License


👤 Author

Afridi Akbar Ifty

GitHub: @iamafridi

Portfolio: iamafrididev.netlify.app

LinkedIn: linkedin.com/in/iamafridi

About

ElaMath is a smart, voice-enabled math assistant that helps students solve and understand math problems using both spoken questions and images. It’s powered by the powerful multimodal meta-llama/llama-4-scout-17b-16e-instruct model via Groq API, combined with Whisper for speech recognition and ElevenLabs/gTTS for natural voice responses.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages