-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
Hey!
Context:
const options = {
newline_boundaries: true,
html_boundaries: true,
html_boundaries_tags: [
'br',
'p',
'h1',
'h2',
'h3',
'h4',
'h5',
'h6',
'ul',
'div',
'figcaption',
],
sanitize: true,
preserve_whitespace: true,
}
const html =`<article> <span>a span here</span><h1>This is a a very cool title.</h1></article>`
console.log(tokenizer.sentences(html, options))
Expected Result:
[ 'a span here', 'This is a a very cool title.' ]
Actual Result:
['a span hereThis is a a very cool title.' ]
I do realise that <span>
is not marked as a boundary html tag but in my opinion that shouldn't let its content leak into the text of its sibling html boundary tags.
Metadata
Metadata
Assignees
Labels
No labels