Skip to content

Commit f1a4e5b

Browse files
Merge pull request #41 from machine-intelligence-laboratory/feature/pic-for-each-viewer
Feature/pic for each viewer
2 parents 607fbbf + 9888cbf commit f1a4e5b

File tree

9 files changed

+108
-65
lines changed

9 files changed

+108
-65
lines changed

README.md

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Consider using TopicNet if:
3838
* you want to build a good topic model quickly (out-of-box, with default parameters).
3939
* you have an ARTM model at hand and you want to explore it's topics.
4040

41-
`TopicNet` provides an infrastructure for your prototyping with the help of `Experiment` class and helps to observe results of your actions via `viewers` module.
41+
`TopicNet` provides an infrastructure for your prototyping with the help of `Experiment` class and helps to observe results of your actions via [`viewers`](topicnet/viewers) module.
4242

4343
<p>
4444
<div align="center">
@@ -159,7 +159,7 @@ Here we can finally get on the main part: making your own, best of them all, man
159159
We need to load our data prepared previously with Dataset:
160160

161161
```python
162-
data = Dataset('/Wiki_raw_set/wiki_data.csv')
162+
dataset = Dataset('/Wiki_raw_set/wiki_data.csv')
163163
```
164164

165165
### Make initial model
@@ -169,8 +169,8 @@ In case you want to start from a fresh model we suggest you use this code:
169169
```python
170170
from topicnet.cooking_machine.model_constructor import init_simple_default_model
171171

172-
model_artm = init_simple_default_model(
173-
dataset=data,
172+
artm_model = init_simple_default_model(
173+
dataset=dataset,
174174
modalities_to_use={'@lemmatized': 1.0, '@bigram':0.5},
175175
main_modality='@lemmatized',
176176
specific_topics=14,
@@ -188,7 +188,8 @@ class CustomScore(BaseScore):
188188
def __init__(self):
189189
super().__init__()
190190

191-
def call(self, model,
191+
def call(self,
192+
model,
192193
eps=1e-5,
193194
n_specific_topics=14):
194195

@@ -204,7 +205,7 @@ Now, `TopicModel` with custom score can be defined:
204205
from topicnet.cooking_machine.models.topic_model import TopicModel
205206

206207
custom_scores = {'SpecificSparsity': CustomScore()}
207-
topic_model = TopicModel(model_artm, model_id='Groot', custom_scores=custom_scores)
208+
topic_model = TopicModel(artm_model, model_id='Groot', custom_scores=custom_scores)
208209
```
209210

210211
### Define experiment
@@ -223,6 +224,7 @@ Defining a next stage of the model training to select a decorrelator parameter:
223224
```python
224225
from topicnet.cooking_machine.cubes import RegularizersModifierCube
225226

227+
226228
my_first_cube = RegularizersModifierCube(
227229
num_iter=5,
228230
tracked_score_function='PerplexityScore@lemmatized',
@@ -234,7 +236,7 @@ my_first_cube = RegularizersModifierCube(
234236
verbose=True,
235237
)
236238

237-
my_first_cube(tm, demo_data)
239+
my_first_cube(topic_model, dataset)
238240
```
239241

240242
Selecting a model with best perplexity score:
@@ -245,18 +247,19 @@ best_model = experiment.select(perplexity_criterion)
245247
```
246248

247249
### View the results
248-
Browsing the model is easy: create a viewer and call its `view()` method:
250+
251+
Browsing the model is easy: create a viewer and call its `view()` method (or `view_from_jupyter()` — it is advised to use it if working in Jupyter Notebook):
249252

250253
```python
251-
from IPython.display import HTML
254+
from topicnet.viewers import TopTokensViewer
252255

253-
threshold = 1e-5
254-
viewer = TopTokensViewer(best_model, num_top_tokens=10, method='phi')
255-
html_view = viewer.to_html(top_tok.view(), thresh=threshold)
256256

257-
HTML(html_view)
257+
toptok_viewer = TopTokensViewer(best_model, num_top_tokens=10, method='phi')
258+
toptok_viewer.view_from_jupyter()
258259
```
259260

261+
More info about different viewers is available here: [`viewers`](topicnet/viewers).
262+
260263
# FAQ
261264

262265
### In the example we used to write vw modality like **@modality**, is it a VowpallWabbit format?
@@ -282,16 +285,16 @@ class_ids_cube = CubeCreator(
282285
)
283286
```
284287

285-
However for the case of modalities a couple of slightly more convenient methods are availiable:
288+
However, for the case of modalities a couple of slightly more convenient methods are availiable:
286289

287290
```python
288291
parameters : [
289292
{
290-
'name': 'class_ids@text',
293+
'name' : 'class_ids@text',
291294
'values': [1, 2, 3]
292295
},
293296
{
294-
'name': 'class_ids@ngrams',
297+
'name' : 'class_ids@ngrams',
295298
'values': [4, 5, 6]
296299
}
297300
]
46.3 KB
Loading
76.8 KB
Loading
Loading
20.3 KB
Loading
39.6 KB
Loading
Loading

topicnet/viewers/README-rus.md

Lines changed: 0 additions & 36 deletions
This file was deleted.

topicnet/viewers/README.md

Lines changed: 89 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,103 @@
1-
## Viewers
1+
# Viewers
22

3-
Module ```viewers``` provides information from a topic model allowing to estimate the model quality. Its advantage is in unified call ifrastucture to the topic model making the routine and tedious task of extracting the information easy.
3+
Module `viewers` provides information from a topic model allowing to estimate the model quality.
4+
Its advantage is in unified call ifrastucture to the topic model making the routine and tedious task of extracting the information easy.
45

5-
Currently module contains following viewers:
6+
Currently module contains the following viewers:
67

8+
## `base_viewer` (`BaseViewer`)
79

8-
* ```base_viewer``` - module responsible for base infrastructure
10+
Module responsible for base infrastructure.
911

10-
* ```spectrum``` - module contains heuristics for solving TSP to arrange topics minimizing total distance of the spectrun,
11-
```TopicSpectrumViewer```
1212

13-
* ```top_documents_viewer``` - module with functions that work with dataset document collections and class ```TopDocumentsViewer``` wrapping this functionality.
13+
## `document_cluster` (`DocumentClusterViewer`)
1414

15-
* ```top_similar_documents_viewer``` - module containing class for finding simmilar document for a givenone. This viewer helps to estimate homogeneity of clusters given by the model
15+
Module which allows to visualize collection documents. May be slow for large document collections as it uses TSNE algorithm from [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) library.
1616

17-
* ```top_tokens_viewer``` - module with class for displaying the most relevant tokens in each topic of the model.
17+
<p>
18+
<div align="center">
19+
<img src="../docs/images/doc_cluster__plot.png" width="80%" alt/>
20+
</div>
21+
<em>
22+
Visualisation of reduced document embeddings colored according to their topic made by DocumentClusterViewer.
23+
</em>
24+
</p>
1825

19-
* ```topic_mapping``` - module allowing to compare topics between two different models trained on the same collection.
2026

21-
* ```document_cluster``` - module containing class ```DocumentClusterViewer``` which allows to visualize collection documents. May be slow for large document collections as it uses TSNE algorithm from sklearn library.
27+
## `spectrum` (`TopicSpectrumViewer`)
2228

29+
Module contains heuristics for solving TSP to arrange topics minimizing total distance of the spectrum.
2330

31+
<p>
32+
<div align="center">
33+
<img src="../docs/images/topic_spectrum__refined_view.png" width="80%" alt/>
34+
</div>
35+
<em>
36+
Each point on the plot represents some topic.
37+
The viewer helped to calculate such a route between topics when one topic is connected with similar one, and so on, forming a circle.
38+
</em>
39+
</p>
2440

25-
* ```initial_doc_to_topic_viewer``` - first edition of ```TopDocumentsViewer``` - Deprecated
2641

27-
* ```tokens_viewer``` - first edition of ```TopTokensViewer``` - Deprecated
42+
## `top_documents_viewer` (`TopDocumentsViewer`)
43+
44+
Module with functions that work with dataset document collections.
45+
46+
<p>
47+
<div align="center">
48+
<img src="../docs/images/top_doc__view.png" width="80%" alt/>
49+
</div>
50+
<em>
51+
The viewer shows fragments of top documents corresponding to some topic.
52+
</em>
53+
</p>
54+
55+
56+
## `top_similar_documents_viewer` (`TopSimilarDocumentsViewer`)
57+
58+
Module containing class for finding similar document for a given one. This viewer helps to estimate homogeneity of clusters given by the model.
59+
60+
<p>
61+
<div align="center">
62+
<img src="../docs/images/top_sim_doc__refined_view.png" width="80%" alt/>
63+
</div>
64+
<em>
65+
Some document from text collection (on top), and documents nearest to it given topic model.
66+
The viewer (currently) gives only document names as output, but the picture is not very difficult to be made.
67+
</em>
68+
</p>
69+
70+
71+
## `top_tokens_viewer` (`TopTokensViewer`)
72+
73+
Module with class for displaying the most relevant tokens in each topic of the model.
74+
75+
<p>
76+
<div align="center">
77+
<img src="../docs/images/top_tokens__view.png" width="80%" alt/>
78+
</div>
79+
<em>
80+
Output of the TopTokensViewer. Token score in the topic is calculated for every token, score function can be specified at the stage of a viewer initialization.
81+
</em>
82+
</p>
83+
84+
85+
## `topic_mapping` (`TopicMapViewer`)
86+
87+
Module allowing to compare topics between two different models trained on the same collection.
88+
89+
<p>
90+
<div align="center">
91+
<img src="../docs/images/topic_map__view.png" width="80%" alt/>
92+
</div>
93+
<em>
94+
The mapping between topics of two models (currently only topic names are displayed).
95+
</em>
96+
</p>
97+
98+
99+
## Deprecated
100+
101+
* `initial_doc_to_topic_viewer` — first edition of `TopDocumentsViewer`
102+
103+
* `tokens_viewer` - first edition of `TopTokensViewer`

0 commit comments

Comments
 (0)