Skip to content

Commit a8dddd9

Browse files
committed
replace newline character before word tokenization
1 parent 7958f21 commit a8dddd9

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

hazm/WordTokenizer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def tokenize(self, text):
4040
['این', 'جمله', '(', 'خیلی', ')', 'پیچیده', 'نیست', '!!!']
4141
"""
4242

43-
text = self.pattern.sub(r' \1 ', text)
43+
text = self.pattern.sub(r' \1 ', text.replace('\n', ' '))
4444
tokens = [word for word in text.split(' ') if word]
4545
if self._join_verb_parts:
4646
tokens = self.join_verb_parts(tokens)

0 commit comments

Comments
 (0)