Improve typos analyzer quality

Mostly about improving the quality of the current TyposCorrector model:

- Improve the data quality: https://github.com/src-d/ml/issues/403 and https://github.com/src-d/ml/issues/402 - _the most important stuff_, several stages of the pipeline depend strongly on it.
- -> Improve the vocabulary. Right now it's mostly fine, but with good splitting it will be much, much better. 
- Work out the best fasttext configuration - I'm already alright with the one that I have, it's light and gives some boost to the quality, so _it doesn't have that big priority_ already.
- Work on the model training configuration - _haven't touched it yet_, not sure that there is a much better one from the current default (most mistakes that I see now I can explain through bad splits or vocabulary or lack of training data (the last will go with training on the bigger dataset, that's easy ofc)).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve typos analyzer quality #758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve typos analyzer quality #758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions