Skip to content

A simple probability based auto-correct / spellcheck implemented in python. This was first created by Peter Norvig

Notifications You must be signed in to change notification settings

Charan0/Probabilistic-AutoCorrect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Probabilistic-AutoCorrect

A simple probability based auto-correct / spellcheck implemented in python. This was first created by Peter Norvig in 2007.

A much detailed explanation can be found at Peter Norvig's AutoCorrect implementation

How it works ?

Based on a huge corpus we build our vocab (Set of unique words in the corpus).We then build a probs dict that contains the probability of every word in the corpus, this is called P(W).

Now based on the vocab, given a sentence we find the misspelled word and find all the words that are 'n edit distance' away, in general an edit distance of 1, 2, 3 is used in auto-correct and based on these words we find the most probable word and replace it

To run the script:

No additional dependencies are used other than the re module and the Counter class from the collections module

Use : python main.py

About

A simple probability based auto-correct / spellcheck implemented in python. This was first created by Peter Norvig

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages