A comprehensive web application built with Flask and BioBERT that provides intelligent information about cancer treatments, side effects, and medical guidance using advanced NLP processing and AI-powered question answering.
- AI-Powered Question Answering: BioBERT model trained on biomedical literature for accurate cancer-related responses
- Interactive Chat Interface: Modern web-based chat with real-time question answering
- Advanced NLP Pipeline: Automatic spaCy model downloading, multilingual support (English/French), and intelligent text processing
- Semantic Context Retrieval: Sentence transformer-based similarity matching for relevant context extraction
- Comprehensive Knowledge Base: Curated cancer Q&A dataset with medical information
- Modern Web Interface: Bootstrap-powered responsive design with Chart.js visualizations
- Medical Statistics Dashboard: Analytics and insights about cancer treatments and data
- Robust Error Handling: Comprehensive test suite and error management
- Cross-Platform Compatibility: Universal path handling for Windows, macOS, and Linux
- Python 3.8 or higher
- pip package manager
- 4GB+ RAM (for BioBERT model)
-
Clone the repository:
git clone https://github.com/SecurDrgorP/CancerCare-AI.git cd CancerCare-AI
-
Install dependencies:
pip install -r requirements.txt
-
Download and save the BioBERT model locally (REQUIRED - run this first):
python scripts/saveModel.py
This will download the BioBERT model (~1.3GB) and save it locally in the
model/
directory. -
spaCy models will be automatically downloaded on first run (no manual installation needed)
-
Run the Flask application:
python app.py
or
flask run
-
Open your browser and navigate to
http://localhost:5000
CancerCare-AI/
βββ app.py # Main Flask application with routes
βββ biobert_qa.py # BioBERT question answering implementation
βββ nlp_pipeline.py # NLP processing with automatic spaCy downloading
βββ context_provider.py # Semantic context retrieval using sentence transformers
βββ data_handler.py # Data management and processing
βββ requirements.txt # Python dependencies
βββ scripts/ # Setup and utility scripts
β βββ saveModel.py # BioBERT model download and setup script
βββ model/ # BioBERT model storage
β βββ biobert_v1.1_pubmed_squad_v2_local/ # Local BioBERT model
βββ data/ # Medical datasets
β βββ cancer_qa_dataset.json # Cancer Q&A knowledge base
βββ templates/ # HTML templates
β βββ base.html # Base template with Bootstrap
β βββ index.html # Landing page
β βββ chat.html # Chat interface
β βββ statistics.html # Analytics dashboard
β βββ error pages (404.html, 500.html)
βββ static/ # Static web assets
β βββ css/ # Custom stylesheets
β βββ js/ # JavaScript files
βββ tests/ # Comprehensive test suite
β βββ test_app.py # Flask app tests
β βββ test_biobert_qa.py # BioBERT tests
β βββ test_nlp_pipeline.py # NLP pipeline tests
β βββ test_context_provider.py # Context retrieval tests
βββ README.md # Project documentation
- Landing Page (
/
): Overview with example queries and statistics - Chat Interface (
/chat
): Full-page chat with AI-powered responses - Statistics Dashboard (
/statistics
): Analytics and treatment insights
POST /api/chat
: Send questions and receive AI-generated answersGET /api/statistics
: Retrieve treatment and cancer statistics
- "What are treatment options for breast cancer stage 2?"
- "Side effects of chemotherapy?"
- "Diet recommendations during radiation?"
- "Recovery time after surgery?"
- "What is immunotherapy for lung cancer?"
- "How does radiation therapy work?"
- "Symptoms of ovarian cancer?"
- "Cost of cancer treatments?"
-
Flask Application (
app.py
)- RESTful API with multiple routes
- Template rendering with Jinja2
- Integration with AI models and data handlers
- Error handling and logging
-
BioBERT QA System (
biobert_qa.py
)- Loads pre-trained BioBERT model from local storage
- Performs extractive question answering
- Optimized for biomedical text understanding
- GPU/CPU compatibility with automatic device detection
-
NLP Pipeline (
nlp_pipeline.py
)- Automatic spaCy model downloading (en_core_web_sm, fr_core_news_sm)
- Multilingual text processing (English/French)
- Language detection and normalization
- Medical entity extraction and text cleaning
-
Context Provider (
context_provider.py
)- Sentence transformer-based semantic search
- Finds most relevant context from knowledge base
- Uses all-MiniLM-L6-v2 model for embeddings
- Efficient similarity matching and ranking
-
Data Handler (
data_handler.py
)- Manages medical datasets and knowledge base
- Provides structured access to cancer information
- Statistics generation and data aggregation
The project includes a comprehensive test suite covering all major components.
-
Run all tests:
pytest
-
Run tests with verbose output:
pytest -v
-
Run specific test file:
pytest tests/test_biobert_qa.py
-
Run tests with coverage report:
pytest --cov=. --cov-report=html
test_biobert_qa.py
: Tests BioBERT model initialization, question answering, and error handlingtest_nlp_pipeline.py
: Tests NLP processing, language detection, and spaCy model managementtest_context_provider.py
: Tests semantic search, embedding generation, and context retrievaltest_data_handler.py
: Tests data loading, management, and statistics generation
The test suite covers:
- β Model loading and initialization
- β Question answering accuracy
- β NLP pipeline processing
- β Context retrieval functionality
- β Data handling and management
- β Error handling and edge cases
- β API endpoint responses
The application uses curated medical datasets including:
- Cancer Types: 10+ common cancer types with staging and symptoms
- Treatments: 10+ treatment modalities with effectiveness and costs
- Side Effects: 20+ common side effects with frequencies
- FAQ: 20+ frequently asked questions with medical answers
- Backend: Flask web framework with Jinja2 templating
- AI/ML: BioBERT (Transformers), Sentence Transformers, PyTorch
- NLP: spaCy (with automatic model downloading), NLTK, langdetect
- Frontend: HTML5, Bootstrap 5, Chart.js, Custom CSS/JavaScript
- Data Processing: Pandas, NumPy for data manipulation
- Visualization: Chart.js for interactive web charts, Matplotlib/Seaborn for analytics
- Testing: pytest with comprehensive test coverage
- Dependencies: See
requirements.txt
for complete list - Model Storage: Local BioBERT model (~1.3GB) for offline operation
IMPORTANT: This application provides general information for educational purposes only. It is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult with qualified healthcare professionals for medical decisions.
- β AI-powered question answering using BioBERT for biomedical accuracy
- β Modern, responsive web interface with professional medical design
- β Comprehensive test suite with good coverage
- β Automatic dependency management (spaCy models)
- β Cross-platform compatibility (Windows, macOS, Linux)
- β Proper medical disclaimers throughout the application
- β Fast response times with local model inference
- β Semantic context retrieval for relevant information
- Enhanced AI Models: Integration with newer biomedical language models
- Real-time Data: Connection to medical databases and research APIs
- Personalized Responses: User profile-based recommendations
- Multi-language Support: Expansion beyond English and French
- Mobile App: React Native or Flutter mobile application
- Voice Interface: Speech-to-text and text-to-speech capabilities
- Clinical Integration: EHR integration and clinical decision support
- Advanced Analytics: Machine learning for treatment outcome predictions
-
Model Download Issues:
# If saveModel.py fails, try manually: python scripts/saveModel.py
-
spaCy Model Problems:
# Manual installation if auto-download fails: python -m spacy download en_core_web_sm python -m spacy download fr_core_news_sm
-
Memory Issues:
- Ensure at least 4GB RAM available
- Close other applications when running BioBERT
- Consider using smaller batch sizes
-
Port Already in Use:
# Use a different port: export FLASK_RUN_PORT=5001 flask run
-
Import Errors:
# Reinstall dependencies: pip install -r requirements.txt --force-reinstall
If tests fail, check:
- All dependencies are installed:
pip install -r requirements.txt
- BioBERT model is downloaded:
python scripts/saveModel.py
- Python version compatibility (3.8+)
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For questions or support, please refer to the documentation or create an issue in the repository.
Built with β€οΈ for cancer patients and caregivers