This python notebook repo has code for a mini search engine. This involves some pre-processing like tokenization, case conversion, stop word removal and stemming. Refer to the below repo for converting the given corpus into boolean representation: https://github.com/iamharshbit/Boolean_Representation_using_NLTK There are various document similarity indexes like TF(term frequency) and they have some drawbacks as well. The most famous is TfIdf and the notebook has implementation for it.
-
Notifications
You must be signed in to change notification settings - Fork 0
iamharshbit/Mini_search_engine
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This repository has implemented of mini search engine for the corpus of five documents using python, NLTK and Tkinter.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published