Add max_chunk_size parameter inside SemanticChunker #29965
kolhesamiksha
announced in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
Adding max_chunk_size Constraint to Semantic Chunking
Motivation
While implementing Semantic Chunking in my production code, I encountered cases where some chunks exceeded the input sequence length supported by my embedding model (e.g 90k chunks length). This led to truncation and inefficient processing.
To address this, I tried the code and introduced a max_chunk_size parameter to set an upper limit on chunk size, ensuring that all chunks remain within the model’s accepted sequence length.
I believe this would be a valuable addition to LangChain’s chunking utilities, allowing users to better control chunk sizes and avoid embedding model limitations
Proposal (If applicable)
Add max_chunk_size as a parameter
This function processes sentence groups while ensuring that each chunk stays within the
max_chunk_size
limit.Beta Was this translation helpful? Give feedback.
All reactions