Skip to content

v1.6 - Advanced Chunking Strategy

Compare
Choose a tag to compare
@souvikmajumder26 souvikmajumder26 released this 24 Mar 21:48
· 80 commits to main since this release
c9c4eaf
  • Successfully working overall architecture of Automated Agent Routing with LangGraph.
  • Successfully working Conversation Agent fine-tuned for medical domain.
  • Successfully working RAG agent.
  • Successfully working Web Search agent.
  • Successfully working routing from RAG to Web Search based on Retrieval Confidence score (if low).
  • Successfully working routing to appropriate Medical Computer Vision agent based on Classification of uploaded image (brain MRI / chest X-ray / skin lesion).
  • Successfully storing conversation history till specified length.
  • Successfully working backend and frontend.
  • Added ingest_rag_data.py to manually ingest new data for information retrieval.
  • Currently document parsing implemented with PyPDF2, later will provide option of unstructured.io as well (needs tesseract and poppler -installation at system level).
  • Successfully working Medical Computer Vision model agents - Chest X-ray Covid-19 classification, and Skin Lesion Segmentation.
  • Successfully integrated ElevenLabs API to enable speech-to-text and text-to-speech services in conversation.
  • Successfully integrated Input and Output Guardrails.
  • Conversation history is now maintained in Graph State rather than separately managed in the fastapi backend like in previous releases.
  • Updated Chunking Strategy including logic of semantic chunking (chunking respecting semantic boundaries - section, paragraph, sentence boundaries) utilizing section headers specific to different document types (that will be detected) such as research papers, clinical notes, patient records, medical condition reports, guidelines and protocols, and drug information. Also, included medical entity recognition to enrich the document metadata that will aid in hybrid search comparing with the medical entities detected from the user query.
  • Provided chunking strategy options to developer: 'semantic', 'sliding_window', 'recursive', 'hybrid'.
  • Due to exhaustion of Git LFS quota, large model file is now shared via gdrive which will get downloaded automatically in the correct path (added an automatic model downloader script).

What's Changed

Full Changelog: v1.5...v1.6