Chunking strategy for ingesting files? #1903

ModernMagus · 2025-08-03T10:22:00Z

ModernMagus
Aug 3, 2025

As far as I can see in the provided files for LightRAG, it is not possible to change chunking strategy, only use fixed chunking? Is this correct? I must have set up something wrong, ingesting a 10 page document (A4) returned 5 chunks... Thoughts?

onestardao · 2025-08-05T06:18:44Z

onestardao
Aug 5, 2025

yep, you’re right to suspect something's off.

if you're getting exactly 5 chunks for a 10-page A4 doc, chances are LightRAG is using a fixed-size tokenizer-based chunker without adaptive structure detection. this usually triggers two major issues:

[No.2] incorrect chunk boundaries that split semantic units mid-sentence or mid-section

[No.10] PDF-to-text conversion collapsing structure (e.g., headers, bullet points, tables) before chunking even begins

most RAG pipelines suffer from these by default, especially if they run chunking before semantic restoration. we’ve actually documented this and a few related ingestion traps pretty deeply — happy to share if that’s helpful.

you’re not alone on this — but yeah, if you want context-aware or document-type-specific chunking, fixed-size won’t cut it.

0 replies

ModernMagus · 2025-08-05T06:52:37Z

ModernMagus
Aug 5, 2025
Author

So LightRag just uses fixed chunks as strategy? I was hoping for semantic embedding here... You know of a solution using semantic embedding and graphing for a RAG that is relatively simple to install? Been looking for weeks now...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chunking strategy for ingesting files? #1903

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Chunking strategy for ingesting files? #1903

Uh oh!

ModernMagus Aug 3, 2025

Replies: 2 comments

Uh oh!

onestardao Aug 5, 2025

Uh oh!

ModernMagus Aug 5, 2025 Author

ModernMagus
Aug 3, 2025

onestardao
Aug 5, 2025

ModernMagus
Aug 5, 2025
Author