Skip to content

Shabnam2212/query-with-rag-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This code converts scientific fulltext XML files into Markdown, chunks them smartly, and builds a retrieval-augmented generation (RAG) pipeline using LangChain, HuggingFaceEmbeddings, and ChromaDB.

Features Parses NXML files to extract structured scientific content

Saves cleaned Markdown outputs with metadata (title, authors, DOI)

Chunks intelligently (hybrid strategy for short + long documents)

Builds a vector store using all-mpnet-base-v2 embeddings

Prepares a ChromaDB collection for downstream QA/RAG tasks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages