🏥 AI Medical Consultation Assistant

An intelligent medical consultation assistant powered by AI that provides educational health information through voice, text, and image analysis. This application combines speech recognition, natural language processing, and computer vision to offer accessible medical guidance.

⚠️ Important Disclaimer

This AI assistant provides educational information only and is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult qualified healthcare professionals for medical concerns.

✨ Features

🎤 Voice Input: Record audio questions using your microphone
📁 Audio Upload: Upload audio files in various formats (MP3, WAV, etc.)
💬 Text Input: Type your medical questions directly
📸 Image Analysis: Upload medical images for AI analysis
🔊 Audio Response: Listen to AI responses via text-to-speech
📋 Consultation History: Track previous consultations
🔧 Debug Tools: Audio analysis and troubleshooting features
🌐 Web Interface: User-friendly Gradio-based interface

🚀 Quick Start

Prerequisites

Python 3.8 or higher
Microphone (for voice input)
Internet connection (for some AI models)

Installation

Clone the repository

git clone https://github.com/SANJAIB2004/Audio_and_image_based_AI_Bot.git
cd Audio_and_image_based_AI_Bot

Create a virtual environment

python -m venv ai_doctor_env
source ai_doctor_env/bin/activate  # On Windows: ai_doctor_env\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

# Create a .env file in the project root
echo "GROQ_API_KEY=your_groq_api_key_here" > .env

Run the application
```
python app.py
```
Open your browser and navigate to http://127.0.0.1:7860

📦 Dependencies

Create a requirements.txt file with the following dependencies:

gradio>=4.0.0
whisper
gtts
python-dotenv
pillow
groq
transformers
torch
librosa
soundfile
numpy

🔧 Configuration

Environment Variables

Create a .env file in your project root with the following variables:

# Required for advanced AI responses (optional - fallback available)
GROQ_API_KEY=your_groq_api_key_here

# Optional configurations
WHISPER_MODEL=base  # Options: tiny, base, small, medium, large
DEBUG_MODE=true

Getting a GROQ API Key

Visit Groq Console
Sign up for a free account
Generate an API key
Add it to your .env file

Note: The application works without a GROQ API key using local models, but responses will be more limited.

📋 Usage Guide

1. Text Input

Simply type your medical question in the text field
Example: "I have a headache for 3 days, what could be causing it?"

2. Voice Input

Record: Click the microphone button and speak your question
Upload: Upload an existing audio file (MP3, WAV, M4A, etc.)

3. Image Analysis

Upload medical images (rashes, wounds, etc.) for AI analysis
Supports common image formats (JPG, PNG, etc.)

4. Getting Responses

Click "🔬 Analyze" to process your input
View the text response in the "AI Doctor Response" section
Listen to the audio response using the audio player

🏗️ Project Structure

ai-medical-consultation/
├── app.py                 # Main application file
├── requirements.txt       # Python dependencies
├── .env                  # Environment variables (create this)
├── README.md             # This file
├── .gitignore            # Git ignore file
└── temp/                 # Temporary audio files (auto-created)

🔍 Technical Details

AI Models Used

Whisper (OpenAI): Speech-to-text transcription
- Local model: whisper-base
- Cloud option: whisper-large-v3 (via GROQ)
BLIP: Image captioning and analysis
- Model: Salesforce/blip-image-captioning-base
LLaMA 3: Natural language processing
- Model: llama3-70b-8192 (via GROQ)
gTTS: Text-to-speech synthesis

Audio Processing

Supports multiple audio formats
Automatic audio preprocessing (noise reduction, normalization)
Multiple transcription strategies for better accuracy
Detailed audio analysis for debugging

🐛 Troubleshooting

Common Issues

1. Audio not being transcribed

Check microphone permissions
Use the debug feature to analyze audio quality
Try uploading a clear audio file instead

2. "No module named" errors

Ensure all dependencies are installed: pip install -r requirements.txt
Check if you're using the correct Python environment

3. GROQ API errors

Verify your API key in the .env file
Check if you have sufficient API credits
Application will fallback to local models if GROQ fails

4. Out of memory errors

Reduce the Whisper model size in .env (use tiny or small)
Close other applications to free up RAM

Debug Features

Use the built-in debug tools:

Click "🔍 Debug Audio" to analyze audio file quality
Check the console logs for detailed error information
Use "📋 View History" to review past consultations

🚀 Deployment

Local Development

python app.py

Production Deployment

Using Docker (create Dockerfile):

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 7860

CMD ["python", "app.py"]

Using Gradio Spaces:

Create account on Hugging Face Spaces
Upload your code
Add environment variables in Space settings

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow PEP 8 style guidelines
Add logging for debugging
Test with various audio inputs
Update documentation for new features

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

⚖️ Legal & Ethical Considerations

Medical Disclaimer: This tool is for educational purposes only
Privacy: Audio and image data are processed locally when possible
Data Retention: Conversation history is stored locally and cleared on session end
Compliance: Ensure compliance with local healthcare regulations (HIPAA, GDPR, etc.)

🙏 Acknowledgments

OpenAI Whisper for speech recognition
Gradio for the web interface
GROQ for fast AI inference
Hugging Face for pre-trained models

🔮 Future Enhancements

Multi-language support
Integration with more medical databases
Advanced image analysis capabilities
Mobile app version
Appointment scheduling features
Integration with EHR systems

Remember: This AI assistant is a tool to help you learn about health topics, but it should never replace professional medical care. When in doubt, always consult with a qualified healthcare provider.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

SANJAIB2004/Audio_and_image_based_AI_Bot

Folders and files

Latest commit

History

Repository files navigation