[Here](https://github.com/marcolagi/quantulum/blob/28b697dfa997116c1aa3ef63a3ceb8725bffd24f/quantulum/classifier.py#L59) it seems to be trying to strip out unicode punctuation but \p{} is not a supported python regex feature, it just matches literal p Example case: >>> re.sub(ur'\p{P}+', ' ', "apc , f") 'apc , f' >>> re.sub(ur'\p{P}+', ' ', "apc , f p{P}") 'apc , f '