machine-intelligence-laboratory
diff --git a/‎README.md
Lines changed: 19 additions & 16 deletions b/‎README.md
Lines changed: 19 additions & 16 deletions
diff --git a/‎topicnet/docs/images/doc_cluster__plot.png
46.3 KB b/‎topicnet/docs/images/doc_cluster__plot.png
46.3 KB
diff --git a/‎topicnet/docs/images/top_doc__view.png
76.8 KB b/‎topicnet/docs/images/top_doc__view.png
76.8 KB
diff --git a/‎topicnet/docs/images/top_sim_doc__refined_view.png
91.7 KB b/‎topicnet/docs/images/top_sim_doc__refined_view.png
91.7 KB
diff --git a/‎topicnet/docs/images/top_tokens__view.png
20.3 KB b/‎topicnet/docs/images/top_tokens__view.png
20.3 KB
diff --git a/‎topicnet/docs/images/topic_map__view.png
39.6 KB b/‎topicnet/docs/images/topic_map__view.png
39.6 KB
diff --git a/‎topicnet/docs/images/topic_spectrum__refined_view.png
27.8 KB b/‎topicnet/docs/images/topic_spectrum__refined_view.png
27.8 KB
diff --git a/‎topicnet/viewers/README-rus.md
Lines changed: 0 additions & 36 deletions b/‎topicnet/viewers/README-rus.md
Lines changed: 0 additions & 36 deletions
diff --git a/‎topicnet/viewers/README.md
Lines changed: 89 additions & 13 deletions b/‎topicnet/viewers/README.md
Lines changed: 89 additions & 13 deletions
@@ -38,7 +38,7 @@ Consider using TopicNet if:
 * you want to build a good topic model quickly (out-of-box, with default parameters).
 * you have an ARTM model at hand and you want to explore it's topics.
 
-`TopicNet` provides an infrastructure for your prototyping with the help of `Experiment` class and helps to observe results of your actions via `viewers` module.
+`TopicNet` provides an infrastructure for your prototyping with the help of `Experiment` class and helps to observe results of your actions via [`viewers`](topicnet/viewers) module.
 
 <p>
     <div align="center">
@@ -159,7 +159,7 @@ Here we can finally get on the main part: making your own, best of them all, man
 We need to load our data prepared previously with Dataset:
 
 ```python
-data = Dataset('/Wiki_raw_set/wiki_data.csv')
+dataset = Dataset('/Wiki_raw_set/wiki_data.csv')
 ```
 
 ### Make initial model
@@ -169,8 +169,8 @@ In case you want to start from a fresh model we suggest you use this code:
 ```python
 from topicnet.cooking_machine.model_constructor import init_simple_default_model
 
-model_artm = init_simple_default_model(
-    dataset=data,
+artm_model = init_simple_default_model(
+    dataset=dataset,
     modalities_to_use={'@lemmatized': 1.0, '@bigram':0.5},
     main_modality='@lemmatized',
     specific_topics=14,
@@ -188,7 +188,8 @@ class CustomScore(BaseScore):
     def __init__(self):
         super().__init__()
 
-    def call(self, model,
+    def call(self,
+             model,
              eps=1e-5,
              n_specific_topics=14):
 
@@ -204,7 +205,7 @@ Now, `TopicModel` with custom score can be defined:
 from topicnet.cooking_machine.models.topic_model import TopicModel
 
 custom_scores = {'SpecificSparsity': CustomScore()}
-topic_model = TopicModel(model_artm, model_id='Groot', custom_scores=custom_scores)
+topic_model = TopicModel(artm_model, model_id='Groot', custom_scores=custom_scores)
 ```
 
 ### Define experiment
@@ -223,6 +224,7 @@ Defining a next stage of the model training to select a decorrelator parameter:
 ```python
 from topicnet.cooking_machine.cubes import RegularizersModifierCube
 
+
 my_first_cube = RegularizersModifierCube(
     num_iter=5,
     tracked_score_function='PerplexityScore@lemmatized',
@@ -234,7 +236,7 @@ my_first_cube = RegularizersModifierCube(
     verbose=True,
 )
 
-my_first_cube(tm, demo_data)
+my_first_cube(topic_model, dataset)
 ```
 
 Selecting a model with best perplexity score:
@@ -245,18 +247,19 @@ best_model = experiment.select(perplexity_criterion)
 ```
 
 ### View the results
-Browsing the model is easy: create a viewer and call its `view()` method:
+
+Browsing the model is easy: create a viewer and call its `view()` method (or `view_from_jupyter()` — it is advised to use it if working in Jupyter Notebook):
 
 ```python
-from IPython.display import HTML
+from topicnet.viewers import TopTokensViewer
 
-threshold = 1e-5
-viewer = TopTokensViewer(best_model, num_top_tokens=10, method='phi')
-html_view = viewer.to_html(top_tok.view(), thresh=threshold)
 
-HTML(html_view)
+toptok_viewer = TopTokensViewer(best_model, num_top_tokens=10, method='phi')
+toptok_viewer.view_from_jupyter()
 ```
 
+More info about different viewers is available here: [`viewers`](topicnet/viewers).
+
 # FAQ
 
 ### In the example we used to write vw modality like **@modality**, is it a VowpallWabbit format?
@@ -282,16 +285,16 @@ class_ids_cube = CubeCreator(
 )
 ```
 
-However for the case of modalities a couple of slightly more convenient methods are availiable:
+However, for the case of modalities a couple of slightly more convenient methods are availiable:
 
 ```python
 parameters : [
     {
-        'name': 'class_ids@text',
+        'name'  : 'class_ids@text',
         'values': [1, 2, 3]
     },
     {
-        'name': 'class_ids@ngrams',
+        'name'  : 'class_ids@ngrams',
         'values': [4, 5, 6]
     }
 ]
 
@@ -1,27 +1,103 @@
-## Viewers 
+# Viewers
 
-Module ```viewers``` provides information from a topic model allowing to estimate the model quality. Its advantage is in unified call ifrastucture to the topic model making the routine and tedious task of extracting the information easy.
+Module `viewers` provides information from a topic model allowing to estimate the model quality.
+Its advantage is in unified call ifrastucture to the topic model making the routine and tedious task of extracting the information easy.
 
-Currently module contains following viewers:
+Currently module contains the following viewers:
 
+## `base_viewer` (`BaseViewer`)
 
-* ```base_viewer``` - module responsible for base infrastructure
+Module responsible for base infrastructure.
 
-* ```spectrum``` - module contains heuristics for solving TSP to arrange topics minimizing total distance of the spectrun, 
-```TopicSpectrumViewer```
 
-* ```top_documents_viewer``` - module with functions that work with dataset document collections and class ```TopDocumentsViewer``` wrapping this functionality.
+## `document_cluster` (`DocumentClusterViewer`)
 
-* ```top_similar_documents_viewer``` - module containing class for finding simmilar document for a givenone. This viewer helps to estimate homogeneity of clusters given by the model
+Module which allows to visualize collection documents. May be slow for large document collections as it uses TSNE algorithm from [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) library.
 
-* ```top_tokens_viewer``` - module with class for displaying the most relevant tokens in each topic of the model.
+<p>
+    <div align="center">
+        <img src="../docs/images/doc_cluster__plot.png" width="80%" alt/>
+    </div>
+    <em>
+        Visualisation of reduced document embeddings colored according to their topic made by DocumentClusterViewer.
+    </em>
+</p>
 
-* ```topic_mapping``` - module allowing to compare topics between two different models trained on the same collection.
 
-* ```document_cluster``` - module containing  class ```DocumentClusterViewer``` which allows to visualize collection documents. May be slow for large document collections as it uses TSNE algorithm from sklearn library.
+## `spectrum` (`TopicSpectrumViewer`)
 
+Module contains heuristics for solving TSP to arrange topics minimizing total distance of the spectrum.
 
+<p>
+    <div align="center">
+        <img src="../docs/images/topic_spectrum__refined_view.png" width="80%" alt/>
+    </div>
+    <em>
+        Each point on the plot represents some topic.
+        The viewer helped to calculate such a route between topics when one topic is connected with similar one, and so on, forming a circle.
+    </em>
+</p>
 
-* ```initial_doc_to_topic_viewer``` - first edition of  ```TopDocumentsViewer``` - Deprecated
 
-* ```tokens_viewer``` - first edition of  ```TopTokensViewer``` - Deprecated
+## `top_documents_viewer` (`TopDocumentsViewer`)
+
+Module with functions that work with dataset document collections.
+
+<p>
+    <div align="center">
+        <img src="../docs/images/top_doc__view.png" width="80%" alt/>
+    </div>
+    <em>
+        The viewer shows fragments of top documents corresponding to some topic.
+    </em>
+</p>
+
+
+## `top_similar_documents_viewer` (`TopSimilarDocumentsViewer`)
+
+Module containing class for finding similar document for a given one. This viewer helps to estimate homogeneity of clusters given by the model.
+
+<p>
+    <div align="center">
+        <img src="../docs/images/top_sim_doc__refined_view.png" width="80%" alt/>
+    </div>
+    <em>
+        Some document from text collection (on top), and documents nearest to it given topic model.
+        The viewer (currently) gives only document names as output, but the picture is not very difficult to be made.
+    </em>
+</p>
+
+
+## `top_tokens_viewer` (`TopTokensViewer`)
+
+Module with class for displaying the most relevant tokens in each topic of the model.
+
+<p>
+    <div align="center">
+        <img src="../docs/images/top_tokens__view.png" width="80%" alt/>
+    </div>
+    <em>
+        Output of the TopTokensViewer. Token score in the topic is calculated for every token, score function can be specified at the stage of a viewer initialization.
+    </em>
+</p>
+
+
+## `topic_mapping` (`TopicMapViewer`)
+
+Module allowing to compare topics between two different models trained on the same collection.
+
+<p>
+    <div align="center">
+        <img src="../docs/images/topic_map__view.png" width="80%" alt/>
+    </div>
+    <em>
+        The mapping between topics of two models (currently only topic names are displayed).
+    </em>
+</p>
+
+
+## Deprecated
+
+* `initial_doc_to_topic_viewer` — first edition of `TopDocumentsViewer`
+
+* `tokens_viewer` - first edition of `TopTokensViewer`