Chunking strategy for ingesting files? #1903
Replies: 2 comments
-
yep, you’re right to suspect something's off. if you're getting exactly 5 chunks for a 10-page A4 doc, chances are LightRAG is using a fixed-size tokenizer-based chunker without adaptive structure detection. this usually triggers two major issues:
most RAG pipelines suffer from these by default, especially if they run chunking before semantic restoration. we’ve actually documented this and a few related ingestion traps pretty deeply — happy to share if that’s helpful. you’re not alone on this — but yeah, if you want context-aware or document-type-specific chunking, fixed-size won’t cut it. |
Beta Was this translation helpful? Give feedback.
-
So LightRag just uses fixed chunks as strategy? I was hoping for semantic embedding here... You know of a solution using semantic embedding and graphing for a RAG that is relatively simple to install? Been looking for weeks now... |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
As far as I can see in the provided files for LightRAG, it is not possible to change chunking strategy, only use fixed chunking? Is this correct? I must have set up something wrong, ingesting a 10 page document (A4) returned 5 chunks... Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions