AI-Powered Lung and Heart Disease Detection
- Project Overview
- Quick Start
- About the Project
- Technical Approach
- Performance Metrics
- Usage Examples
- Dataset Reference
- CI/CD & Testing
- Challenges & Limitations
- Learning Outcomes
VitalSenseAI is an advanced AI system designed to detect lung diseases from respiratory sounds, leveraging the power of deep learning. This repository contains the core machine learning models and codebase for the project, which aims to enable early and accurate detection of critical lung conditions using recordings from electronic stethoscopes.
This repository contains the core AI models and code for VitalSenseAI, a project aimed at helping people detect lung diseases from respiratory sounds. The models analyze audio recordings of lung sounds, captured via electronic stethoscopes, and predict the presence of major lung diseases. My goal is to provide a useful tool that can assist in the early detection of:
- Bronchiectasis
- Chronic Obstructive Pulmonary Disease (COPD)
- Pneumonia
- Upper Respiratory Tract Infection (URTI)
- Dataset: We used the ICBHI Respiratory Sound Database, which contains thousands of annotated lung sound recordings.
- Challenge: The dataset is highly imbalanced, with some diseases underrepresented. We addressed this using techniques like minority upsampling using RandomOverSampler from
imblearn
. - Audio Preprocessing:
- All audio is resampled to 16kHz and padded/truncated to a fixed length for consistency.
- We experimented with multiple feature extraction methods to best represent lung sounds for AI analysis.
To prepare the audio data for AI analysis, we experimented with several feature extraction techniques to best capture the characteristics of lung sounds:
- Raw Waveform (Time Series)
- Spectrograms
- MFCC (Mel-Frequency Cepstral Coefficients)
- Log-Mel Spectrograms
Figure: Visualization of a lung sound sample in different audio feature domains—raw waveform, spectrogram, MFCC, and Log-Mel spectrogram.
Outcome: Among these, Log-Mel spectrograms consistently delivered the highest accuracy (over 90%) on both training and testing datasets, making them the preferred input representation for our models.
Convolutional Neural Networks (CNNs)
- Designed and trained custom CNN architectures to classify lung sounds.
- Multiple models were trained with different input encodings (MFCC, Log-Mel, etc.).
- The best-performing model uses Log-Mel spectrograms as input.
Figure: Architecture of the custom Convolutional Neural Network (CNN) used for classifying lung diseases from Log-Mel spectrogram representations of respiratory sounds.
Gradio Interface
- An interactive web interface is provided for easy testing and demonstration.
- Users can upload or record audio and receive instant predictions. HuggingFace Spaces
- The model is deployed on HuggingFace Spaces for public access.
- Try it out here!
-
Imbalanced Dataset:
- The original dataset had significant class imbalance. While RandomOverSampler improved results, a more balanced dataset from the start would likely yield even better performance.
-
Hardware Accessibility:
- The current system requires electronic stethoscopes, which are expensive and not widely available, especially in low-resource settings. We aim to adapt our models for use with more affordable and accessible recording devices in the future, making lung health screening possible at home.
This project has been a tremendous learning journey, including:
- Deep Learning for Audio: Gained hands-on experience in applying CNNs to audio classification problems.
- Feature Engineering: Explored and compared various audio feature extraction techniques.
- Model Deployment: Learned to deploy models using Gradio and HuggingFace Spaces, making AI accessible to non-technical users.
- Research & Problem Solving: Tackled real-world challenges like data imbalance, hardware limitations, and the nuances of medical data.
- End-to-End AI Product Development: From data preprocessing to model training, evaluation, and deployment.
Built With ❤️ by Jivesh Kalra