.Net: Fix TextChunker.SplitPlainTextLines to actually split on newlines regardless of token count #12558
+52
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Summary
Fixes issue #12556 where
TextChunker.SplitPlainTextLines
does not actually split text on newlines when the input token count is less than themaxTokensPerLine
parameter.Problem
The
SplitPlainTextLines
method had two issues:"\n\r"
which looks for text containing both newline AND carriage return, rather than splitting ON newlinesThis caused the method to return unsplit text with preserved newline characters instead of separate lines, which was counterintuitive given the method name.
Solution
s_plaintextLineSplitOptions
array with"\n"
as the first separator specifically for line splittingSplitPlainTextLines
to always split on newlines first, regardless of token countSplitPlainTextParagraphs
by keeping the original split optionsChanges
s_plaintextLineSplitOptions
array for proper line splitting behaviorSplitPlainTextLines
method to prioritize newline splitting over token limitsSplitPlainTextParagraphs
functionality unchangedTesting
CanSplitTextParagraphsOnNewlines
Verification
The method now correctly splits this input:
"First line\nSecond line\nThird line"
Fixes #12557