v1.6 - Advanced Chunking Strategy
- Successfully working overall architecture of Automated Agent Routing with LangGraph.
- Successfully working Conversation Agent fine-tuned for medical domain.
- Successfully working RAG agent.
- Successfully working Web Search agent.
- Successfully working routing from RAG to Web Search based on Retrieval Confidence score (if low).
- Successfully working routing to appropriate Medical Computer Vision agent based on Classification of uploaded image (brain MRI / chest X-ray / skin lesion).
- Successfully storing conversation history till specified length.
- Successfully working backend and frontend.
- Added ingest_rag_data.py to manually ingest new data for information retrieval.
- Currently document parsing implemented with PyPDF2, later will provide option of unstructured.io as well (needs tesseract and poppler -installation at system level).
- Successfully working Medical Computer Vision model agents - Chest X-ray Covid-19 classification, and Skin Lesion Segmentation.
- Successfully integrated ElevenLabs API to enable speech-to-text and text-to-speech services in conversation.
- Successfully integrated Input and Output Guardrails.
- Conversation history is now maintained in Graph State rather than separately managed in the fastapi backend like in previous releases.
- Updated Chunking Strategy including logic of semantic chunking (chunking respecting semantic boundaries - section, paragraph, sentence boundaries) utilizing section headers specific to different document types (that will be detected) such as research papers, clinical notes, patient records, medical condition reports, guidelines and protocols, and drug information. Also, included medical entity recognition to enrich the document metadata that will aid in hybrid search comparing with the medical entities detected from the user query.
- Provided chunking strategy options to developer: 'semantic', 'sliding_window', 'recursive', 'hybrid'.
- Due to exhaustion of Git LFS quota, large model file is now shared via gdrive which will get downloaded automatically in the correct path (added an automatic model downloader script).
What's Changed
- Updated ingested data with better chunking logic by @souvikmajumder26 in #35
- Added chunking strategy choice by @souvikmajumder26 in #36
- Updated main README and agentic workflow README by @souvikmajumder26 in #39
- Large model file removed by @souvikmajumder26 in #41
Full Changelog: v1.5...v1.6