You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
.Net: Fix TextChunker.SplitPlainTextParagraphs to handle embedded newlines in input strings (#12558)
## Description
### Summary
Fixes issue #12556 where `TextChunker.SplitPlainTextParagraphs` does not
properly handle embedded newlines in input strings.
### Problem
The `SplitPlainTextParagraphs` method had two issues:
1. **Incorrect separator**: Used `"\n\r"` (LF+CR) which is not a
standard line ending format - should be `"\r\n"` (CR+LF) for Windows or
`"\n"` for Unix
2. **No embedded newline handling**: When input strings contained
embedded newlines, they were not split into separate lines for
processing
This caused the method to process text with embedded newlines as single
units instead of handling each line separately.
### Solution
- Modified `s_plaintextSplitOptions` array to use `"\n"` as the
separator for proper newline recognition
- Modified `SplitPlainTextParagraphs` to use `SelectMany` with
`Split('\n')` to handle embedded newlines
- Added normalization of all newline formats (`\r\n`, `\r`, `\n`) to
ensure consistent handling
- Lines are split before processing but may be recombined based on token
limits (expected behavior)
## Changes
- **Modified**: `s_plaintextSplitOptions` array to use correct newline
separator
- **Modified**: `SplitPlainTextParagraphs` method to split embedded
newlines before processing
- **Preserved**: Existing paragraph grouping behavior based on token
limits
## Testing
- ✅ Fixes handling of embedded newlines in input strings
- ✅ All existing tests continue to pass, including
`CanSplitTextParagraphsOnNewlines`
- ✅ Maintains backward compatibility for paragraph splitting behavior
---------
Co-authored-by: Adit Sheth <adsheth@microsoft.com>
Co-authored-by: Kyle Rader <126627085+kyle-rader-msft@users.noreply.github.com>
Co-authored-by: westey <164392973+westey-m@users.noreply.github.com>
Co-authored-by: Mark Wallace <127216156+markwallace-microsoft@users.noreply.github.com>
0 commit comments