Skip to content

SANJAIB2004/Audio_and_image_based_AI_Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฅ AI Medical Consultation Assistant

An intelligent medical consultation assistant powered by AI that provides educational health information through voice, text, and image analysis. This application combines speech recognition, natural language processing, and computer vision to offer accessible medical guidance.

Python Gradio Whisper License

โš ๏ธ Important Disclaimer

This AI assistant provides educational information only and is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult qualified healthcare professionals for medical concerns.

โœจ Features

  • ๐ŸŽค Voice Input: Record audio questions using your microphone
  • ๐Ÿ“ Audio Upload: Upload audio files in various formats (MP3, WAV, etc.)
  • ๐Ÿ’ฌ Text Input: Type your medical questions directly
  • ๐Ÿ“ธ Image Analysis: Upload medical images for AI analysis
  • ๐Ÿ”Š Audio Response: Listen to AI responses via text-to-speech
  • ๐Ÿ“‹ Consultation History: Track previous consultations
  • ๐Ÿ”ง Debug Tools: Audio analysis and troubleshooting features
  • ๐ŸŒ Web Interface: User-friendly Gradio-based interface

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Microphone (for voice input)
  • Internet connection (for some AI models)

Installation

  1. Clone the repository

    git clone https://github.com/SANJAIB2004/Audio_and_image_based_AI_Bot.git
    cd Audio_and_image_based_AI_Bot
  2. Create a virtual environment

    python -m venv ai_doctor_env
    source ai_doctor_env/bin/activate  # On Windows: ai_doctor_env\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Set up environment variables

    # Create a .env file in the project root
    echo "GROQ_API_KEY=your_groq_api_key_here" > .env
  5. Run the application

    python app.py
  6. Open your browser and navigate to http://127.0.0.1:7860

๐Ÿ“ฆ Dependencies

Create a requirements.txt file with the following dependencies:

gradio>=4.0.0
whisper
gtts
python-dotenv
pillow
groq
transformers
torch
librosa
soundfile
numpy

๐Ÿ”ง Configuration

Environment Variables

Create a .env file in your project root with the following variables:

# Required for advanced AI responses (optional - fallback available)
GROQ_API_KEY=your_groq_api_key_here

# Optional configurations
WHISPER_MODEL=base  # Options: tiny, base, small, medium, large
DEBUG_MODE=true

Getting a GROQ API Key

  1. Visit Groq Console
  2. Sign up for a free account
  3. Generate an API key
  4. Add it to your .env file

Note: The application works without a GROQ API key using local models, but responses will be more limited.

๐Ÿ“‹ Usage Guide

1. Text Input

  • Simply type your medical question in the text field
  • Example: "I have a headache for 3 days, what could be causing it?"

2. Voice Input

  • Record: Click the microphone button and speak your question
  • Upload: Upload an existing audio file (MP3, WAV, M4A, etc.)

3. Image Analysis

  • Upload medical images (rashes, wounds, etc.) for AI analysis
  • Supports common image formats (JPG, PNG, etc.)

4. Getting Responses

  • Click "๐Ÿ”ฌ Analyze" to process your input
  • View the text response in the "AI Doctor Response" section
  • Listen to the audio response using the audio player

๐Ÿ—๏ธ Project Structure

ai-medical-consultation/
โ”œโ”€โ”€ app.py                 # Main application file
โ”œโ”€โ”€ requirements.txt       # Python dependencies
โ”œโ”€โ”€ .env                  # Environment variables (create this)
โ”œโ”€โ”€ README.md             # This file
โ”œโ”€โ”€ .gitignore            # Git ignore file
โ””โ”€โ”€ temp/                 # Temporary audio files (auto-created)

๐Ÿ” Technical Details

AI Models Used

  1. Whisper (OpenAI): Speech-to-text transcription

    • Local model: whisper-base
    • Cloud option: whisper-large-v3 (via GROQ)
  2. BLIP: Image captioning and analysis

    • Model: Salesforce/blip-image-captioning-base
  3. LLaMA 3: Natural language processing

    • Model: llama3-70b-8192 (via GROQ)
  4. gTTS: Text-to-speech synthesis

Audio Processing

  • Supports multiple audio formats
  • Automatic audio preprocessing (noise reduction, normalization)
  • Multiple transcription strategies for better accuracy
  • Detailed audio analysis for debugging

๐Ÿ› Troubleshooting

Common Issues

1. Audio not being transcribed

  • Check microphone permissions
  • Use the debug feature to analyze audio quality
  • Try uploading a clear audio file instead

2. "No module named" errors

  • Ensure all dependencies are installed: pip install -r requirements.txt
  • Check if you're using the correct Python environment

3. GROQ API errors

  • Verify your API key in the .env file
  • Check if you have sufficient API credits
  • Application will fallback to local models if GROQ fails

4. Out of memory errors

  • Reduce the Whisper model size in .env (use tiny or small)
  • Close other applications to free up RAM

Debug Features

Use the built-in debug tools:

  • Click "๐Ÿ” Debug Audio" to analyze audio file quality
  • Check the console logs for detailed error information
  • Use "๐Ÿ“‹ View History" to review past consultations

๐Ÿš€ Deployment

Local Development

python app.py

Production Deployment

Using Docker (create Dockerfile):

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 7860

CMD ["python", "app.py"]

Using Gradio Spaces:

  1. Create account on Hugging Face Spaces
  2. Upload your code
  3. Add environment variables in Space settings

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add logging for debugging
  • Test with various audio inputs
  • Update documentation for new features

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

โš–๏ธ Legal & Ethical Considerations

  • Medical Disclaimer: This tool is for educational purposes only
  • Privacy: Audio and image data are processed locally when possible
  • Data Retention: Conversation history is stored locally and cleared on session end
  • Compliance: Ensure compliance with local healthcare regulations (HIPAA, GDPR, etc.)

๐Ÿ™ Acknowledgments

๐Ÿ”ฎ Future Enhancements

  • Multi-language support
  • Integration with more medical databases
  • Advanced image analysis capabilities
  • Mobile app version
  • Appointment scheduling features
  • Integration with EHR systems

Remember: This AI assistant is a tool to help you learn about health topics, but it should never replace professional medical care. When in doubt, always consult with a qualified healthcare provider.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published