You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: topicnet/demos/README.md
+9Lines changed: 9 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -2,14 +2,23 @@
2
2
This section provides demonstrations of how to use this library in NLP tasks.
3
3
4
4
`1-RTL-Wiki-Preprocessing` -- notebook describing how to get a wikipedia dataset and write data in VW format.
5
+
5
6
`2-RTL-Wiki-Building-Topic-Model` -- notebook with first steps to build topic model by consequently tuning it's hyperparameters
7
+
6
8
`3-Visualizing-Your-Model-Documents` -- notebook providing a fres outlook on unstructured document collection with the help of a topic model
9
+
7
10
`4-20NG-Preprocessing` -- preparing data from a well-know 20 Newsgroups dataset
11
+
8
12
`5-20NG-GenSim-vs-TopicNet` -- a comparisson between two topic models build by Gensim and TopicNet library. In the notebook we compare model topics by calculating their [UMass coherence measure](https://palmetto.demos.dice-research.org/) and using Jaccard measure to compare topic top-tokens diversity
13
+
9
14
`6-Postnauka-Building-Topic-Model` -- an analog of the RTL-Wiki notebook performed on the corpus of Russian pop-science articles given by postnauka.ru
15
+
10
16
`7-Postnauka-Recipe` -- a demonstration of rapid-prototyping methods provided by the library
17
+
11
18
`8-Coherence-Maximization-Recipe` -- a recipe for hyperparameter search in regard to custom Coherence metric
19
+
12
20
`9-Topic-Prior-Regularizer-Tutorial` -- a demonstration of the approach to learning topics from the unbalanced corpus
21
+
13
22
`10-Making-Decorrelation-and-Topic-Selection-Friends` -- reproduction of a very complicated experiment on automatically learning optimal number of topics from the collection. Hurdle is - both needed regularizers when working together nullify token-topic matrix.
0 commit comments