An intelligent medical consultation assistant powered by AI that provides educational health information through voice, text, and image analysis. This application combines speech recognition, natural language processing, and computer vision to offer accessible medical guidance.
This AI assistant provides educational information only and is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult qualified healthcare professionals for medical concerns.
- ๐ค Voice Input: Record audio questions using your microphone
- ๐ Audio Upload: Upload audio files in various formats (MP3, WAV, etc.)
- ๐ฌ Text Input: Type your medical questions directly
- ๐ธ Image Analysis: Upload medical images for AI analysis
- ๐ Audio Response: Listen to AI responses via text-to-speech
- ๐ Consultation History: Track previous consultations
- ๐ง Debug Tools: Audio analysis and troubleshooting features
- ๐ Web Interface: User-friendly Gradio-based interface
- Python 3.8 or higher
- Microphone (for voice input)
- Internet connection (for some AI models)
-
Clone the repository
git clone https://github.com/SANJAIB2004/Audio_and_image_based_AI_Bot.git cd Audio_and_image_based_AI_Bot
-
Create a virtual environment
python -m venv ai_doctor_env source ai_doctor_env/bin/activate # On Windows: ai_doctor_env\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
# Create a .env file in the project root echo "GROQ_API_KEY=your_groq_api_key_here" > .env
-
Run the application
python app.py
-
Open your browser and navigate to
http://127.0.0.1:7860
Create a requirements.txt
file with the following dependencies:
gradio>=4.0.0
whisper
gtts
python-dotenv
pillow
groq
transformers
torch
librosa
soundfile
numpy
Create a .env
file in your project root with the following variables:
# Required for advanced AI responses (optional - fallback available)
GROQ_API_KEY=your_groq_api_key_here
# Optional configurations
WHISPER_MODEL=base # Options: tiny, base, small, medium, large
DEBUG_MODE=true
- Visit Groq Console
- Sign up for a free account
- Generate an API key
- Add it to your
.env
file
Note: The application works without a GROQ API key using local models, but responses will be more limited.
- Simply type your medical question in the text field
- Example: "I have a headache for 3 days, what could be causing it?"
- Record: Click the microphone button and speak your question
- Upload: Upload an existing audio file (MP3, WAV, M4A, etc.)
- Upload medical images (rashes, wounds, etc.) for AI analysis
- Supports common image formats (JPG, PNG, etc.)
- Click "๐ฌ Analyze" to process your input
- View the text response in the "AI Doctor Response" section
- Listen to the audio response using the audio player
ai-medical-consultation/
โโโ app.py # Main application file
โโโ requirements.txt # Python dependencies
โโโ .env # Environment variables (create this)
โโโ README.md # This file
โโโ .gitignore # Git ignore file
โโโ temp/ # Temporary audio files (auto-created)
-
Whisper (OpenAI): Speech-to-text transcription
- Local model:
whisper-base
- Cloud option:
whisper-large-v3
(via GROQ)
- Local model:
-
BLIP: Image captioning and analysis
- Model:
Salesforce/blip-image-captioning-base
- Model:
-
LLaMA 3: Natural language processing
- Model:
llama3-70b-8192
(via GROQ)
- Model:
-
gTTS: Text-to-speech synthesis
- Supports multiple audio formats
- Automatic audio preprocessing (noise reduction, normalization)
- Multiple transcription strategies for better accuracy
- Detailed audio analysis for debugging
1. Audio not being transcribed
- Check microphone permissions
- Use the debug feature to analyze audio quality
- Try uploading a clear audio file instead
2. "No module named" errors
- Ensure all dependencies are installed:
pip install -r requirements.txt
- Check if you're using the correct Python environment
3. GROQ API errors
- Verify your API key in the
.env
file - Check if you have sufficient API credits
- Application will fallback to local models if GROQ fails
4. Out of memory errors
- Reduce the Whisper model size in
.env
(usetiny
orsmall
) - Close other applications to free up RAM
Use the built-in debug tools:
- Click "๐ Debug Audio" to analyze audio file quality
- Check the console logs for detailed error information
- Use "๐ View History" to review past consultations
python app.py
Using Docker (create Dockerfile):
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
Using Gradio Spaces:
- Create account on Hugging Face Spaces
- Upload your code
- Add environment variables in Space settings
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Follow PEP 8 style guidelines
- Add logging for debugging
- Test with various audio inputs
- Update documentation for new features
This project is licensed under the MIT License - see the LICENSE file for details.
- Medical Disclaimer: This tool is for educational purposes only
- Privacy: Audio and image data are processed locally when possible
- Data Retention: Conversation history is stored locally and cleared on session end
- Compliance: Ensure compliance with local healthcare regulations (HIPAA, GDPR, etc.)
- OpenAI Whisper for speech recognition
- Gradio for the web interface
- GROQ for fast AI inference
- Hugging Face for pre-trained models
- Multi-language support
- Integration with more medical databases
- Advanced image analysis capabilities
- Mobile app version
- Appointment scheduling features
- Integration with EHR systems
Remember: This AI assistant is a tool to help you learn about health topics, but it should never replace professional medical care. When in doubt, always consult with a qualified healthcare provider.