refine ridmi

Alvant · Alvant · commit 0ef66aa73b57 · 2020-04-06T02:15:45.000+03:00
diff --git a/README.md b/README.md
@@ -21,12 +21,14 @@
     </a>
 </div>
 
+A high-level interface developed by [Machine Intelligence Laboratory](https://mipt.ai/en) for [BigARTM](https://github.com/bigartm/bigartm) library.
 
-### What is TopicNet?
 
-TopicNet is a high-level interface developed by [Machine Intelligence Laboratory](https://mipt.ai/en) for [BigARTM](https://github.com/bigartm/bigartm) library. 
+## What is TopicNet?
 
-```TopicNet```  library was created to assist in the task of building topic models. It aims at automating model training routine freeing more time for artistic process of constructing a target functional for the task at hand.
+
+`TopicNet` library was created to assist in the task of building topic models.
+It aims at automating model training routine freeing more time for artistic process of constructing a target functional for the task at hand.
 
 Consider using TopicNet if:
 
@@ -35,7 +37,7 @@ Consider using TopicNet if:
 * you want to build a good topic model quickly (out-of-box, with default parameters).
 * you have an ARTM model at hand and you want to explore it's topics.
 
-`TopicNet` provides an infrastructure for your prototyping (`Experiment` class) and helps to observe results of your actions via `viewers` module.
+`TopicNet` provides an infrastructure for your prototyping with the help of `Experiment` class and helps to observe results of your actions via `viewers` module.
 
 <p>
     <div align="center">
@@ -51,55 +53,94 @@ Consider using TopicNet if:
 </p>
 
 
-### How to start?
+## How to start
 
 Define `TopicModel` from an ARTM model at hand or with help from `model_constructor` module, where you can set models main parameters. Then create an `Experiment`, assigning a root position to this model and path to store your experiment. Further, you can define a set of training stages by the functionality provided by the `cooking_machine.cubes` module.
 
-Further you can read documentation [here](https://machine-intelligence-laboratory.github.io/TopicNet/). Currently we are in the process of imporving it. 
+Further you can read documentation [here](https://machine-intelligence-laboratory.github.io/TopicNet/).
 
-## How to install TopicNet
 
-**Core library functionality is based on BigARTM library** which required manual installation on all systems.  
-Currently we have working solution for Linux users: 
-```
+# Installation
+
+**Core library functionality is based on BigARTM library**.
+So BigARTM should also be installed on the machine.
+Fortunately, the installation process should not be so difficult now.
+Below are the detailed explanations.
+
+
+## Via pip
+
+The easiest way to install everything is via `pip` (but currently works fine only for Linux users!)
+
+```bash
 pip install topicnet
 ```
-as it is currently awailiable to install BigARTM on linux systems via `pip`. We hoping to bring `pip` installation support to other systems, hovewer right now you may find the following guide useful.
 
-To avoid installing BigARTM you can use [docker images](https://hub.docker.com/r/xtonev/bigartm/tags) with preinstalled different versions of BigARTM library in them. 
+The command also installs BigARTM library, not only TopicNet.
 
-#### Using docker image
-```
+If working on Windows or Mac, you should install BigARTM by yourself first, then `pip install topicnet` will work just fine.
+We are hoping to bring all-in-`pip` installation support to the mentioned systems.
+However, right now you may find the following guide useful.
+
+To avoid installing BigARTM you can use [docker images](https://hub.docker.com/r/xtonev/bigartm/tags) with preinstalled different versions of BigARTM library:
+
+```bash
 docker pull xtonev/bigartm:v0.10.0
 docker run -t -i xtonev/bigartm:v0.10.0
 ```
-#### Check if import is sucessfull
-```
+
+Checking if all installed successfully:
+
+```bash
 python3
 import artm
 artm.version()
 ```
 
 Alternatively, you can follow [BigARTM installation manual](https://bigartm.readthedocs.io/en/stable/installation/index.html).
-After setting up the environment you can fork this repository or use ```pip install topicnet``` to install the library.  
+After setting up the environment you can fork this repository or use `pip install topicnet` to install the library.
+
+
+## From source
 
-## How to use TopicNet
+One can also install the library from GitHub, which may give more flexibility in developing (for example, making one's own viewers or regularizers a part of the module as .py files)
 
-Let's say you have a handful of raw text mined from some source and you want to perform some topic modelling on them. Where should you start? 
-### Data Preparation
-Every ML problem starts with data preprocess step. TopicNet does not perform data preprocessing itself. Instead, it demands data being prepared by the user and loaded via [Dataset class.](topicnet/cooking_machine/dataset.py)
+```bash
+git clone https://github.com/machine-intelligence-laboratory/TopicNet.git
+cd topicnet
+pip install .
+```
+
+
+# Usage
+
+Let's say you have a handful of raw text mined from some source and you want to perform some topic modelling on them.
+Where should you start?
+
+## Data Preparation
+
+Every ML problem starts with data preprocess step.
+TopicNet does not perform data preprocessing itself.
+Instead, it demands data being prepared by the user and loaded via [Dataset](topicnet/cooking_machine/dataset.py) class.
 Here is a basic example of how one can achieve that: [rtl_wiki_preprocessing](topicnet/demos/RTL-WIKI-PREPROCESSING.ipynb).
 
-### Training topic model
+## Training topic model
+
 Here we can finally get on the main part: making your own, best of them all, manually crafted Topic Model
-#### Get your data
+
+### Get your data
+
 We need to load our data prepared previously with Dataset:
-```
+
+```python
 data = Dataset('/Wiki_raw_set/wiki_data.csv')
 ```
-#### Make initial model
+
+### Make initial model
+
 In case you want to start from a fresh model we suggest you use this code:
-```
+
+```python
 from topicnet.cooking_machine.model_constructor import init_simple_default_model
 
 model_artm = init_simple_default_model(
@@ -110,9 +151,11 @@ model_artm = init_simple_default_model(
     background_topics=1,
 )
 ```
-Note that here we have model with two modalities: `'@lemmatized'` and `'@bigram'`.  
+
+Note that here we have model with two modalities: `'@lemmatized'` and `'@bigram'`.
 Further, if needed, one can define a custom score to be calculated during the model training.
-```
+
+```python
 from topicnet.cooking_machine.models.base_score import BaseScore
 
 class CustomScore(BaseScore):
@@ -122,26 +165,36 @@ class CustomScore(BaseScore):
     def call(self, model,
              eps=1e-5,
              n_specific_topics=14):
+
         phi = model.get_phi().values[:,:n_specific_topics]
         specific_sparsity = np.sum(phi < eps) / np.sum(phi < 1)
+
         return specific_sparsity
 ```
+
 Now, `TopicModel` with custom score can be defined:
-```
+
+```python
 from topicnet.cooking_machine.models.topic_model import TopicModel
 
-custom_score_dict = {'SpecificSparsity': CustomScore()}
-tm = TopicModel(model_artm, model_id='Groot', custom_scores=custom_score_dict)
+custom_scores = {'SpecificSparsity': CustomScore()}
+topic_model = TopicModel(model_artm, model_id='Groot', custom_scores=custom_scores)
 ```
-#### Define experiment
+
+### Define experiment
+
 For further model training and tuning `Experiment` is necessary:
-```
+
+```python
 from topicnet.cooking_machine.experiment import Experiment
-experiment = Experiment(experiment_id="simple_experiment", save_path="experiments", topic_model=tm)
+
+experiment = Experiment(experiment_id="simple_experiment", save_path="experiments", topic_model=topic_model)
 ```
-#### Toy with the cubes
+
+### Toy with the cubes
 Defining a next stage of the model training to select a decorrelator parameter:
-```
+
+```python
 from topicnet.cooking_machine.cubes import RegularizersModifierCube
 
 my_first_cube = RegularizersModifierCube(
@@ -152,52 +205,60 @@ my_first_cube = RegularizersModifierCube(
         'tau_grid': [0,1,2,3,4,5],
     },
     reg_search='grid',
-    verbose=True
+    verbose=True,
 )
+
 my_first_cube(tm, demo_data)
 ```
+
 Selecting a model with best perplexity score:
+
+```python
+perplexity_criterion = 'PerplexityScore@lemmatized -> min COLLECT 1'
+best_model = experiment.select(perplexity_criterion)
 ```
-perplexity_select = 'PerplexityScore@lemmatized -> min COLLECT 1'
-best_model = experiment.select(perplexity_select)
-```
-#### View the results
+
+### View the results
 Browsing the model is easy: create a viewer and call its `view()` method:
-```
-thresh = 1e-5
-top_tok = TopTokensViewer(best_model, num_top_tokens=10, method='phi')
-top_tok_html =  top_tok.to_html(top_tok.view(),thresh=thresh)
-for line in first_model_html:
-    display_html(line, raw=True)
+
+```python
+from IPython.display import HTML
+
+threshold = 1e-5
+viewer = TopTokensViewer(best_model, num_top_tokens=10, method='phi')
+html_view = viewer.to_html(top_tok.view(), thresh=threshold)
+
+HTML(html_view)
 ```
 
-## FAQ
+# FAQ
 
-#### In the example we used to write vw modality like **@modality**, is it a VowpallWabbit format?
+## In the example we used to write vw modality like **@modality**, is it a VowpallWabbit format?
 
 It is a convention to write data designating modalities with @ sign taken by TopicNet from BigARTM.
 
-#### CubeCreator helps to perform a grid search over initial model parameters. How can I do it with modalities?
+## CubeCreator helps to perform a grid search over initial model parameters. How can I do it with modalities?
 
 Modality search space can be defined using standart library logic like:
-```
+
+```python
 class_ids_cube = CubeCreator(
     num_iter=5,
     parameters: [
         name: 'class_ids',
         values: {
-        '@text': [1, 2, 3],
-        '@ngrams': [4, 5, 6],
+            '@text':   [1, 2, 3],
+            '@ngrams': [4, 5, 6],
         },
     ]
     reg_search='grid',
-    verbose=True
+    verbose=True,
 )
-
 ```
+
 However for the case of modalities a couple of slightly more convenient methods are availiable:
 
-```
+```python
 parameters : [
     {
         'name': 'class_ids@text',