Skip to content

Commit f7cbf44

Browse files
authored
Merge pull request #49 from machine-intelligence-laboratory/bugfix/actualize-topic-prior-demo
Actualize topic prior reg demo
2 parents 7b7fd06 + 759b32c commit f7cbf44

File tree

2 files changed

+1489
-287
lines changed

2 files changed

+1489
-287
lines changed

topicnet/demos/README.md

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,35 @@
1-
# Demo
1+
# Demos
2+
23
This section provides demonstrations of how to use this library in NLP tasks.
34

4-
1. [RTL-Wiki-Preprocessing](RTL-Wiki-Preprocessing.ipynb) -- notebook working with a dataset introduced in [1]. It serves as an example of a typical preprocessing pipeline: getting a dataset, lemmatizing it, extracting n-grams/collocations, writing data in VW format
5+
1. [RTL-Wiki-Preprocessing](RTL-Wiki-Preprocessing.ipynb) — notebook working with a dataset introduced in [1][1]. It serves as an example of a typical preprocessing pipeline: getting a dataset, lemmatizing it, extracting n-grams/collocations, writing data in VW format
6+
7+
2. [RTL-Wiki-Building-Topic-Mode](RTL-Wiki-Building-Topic-Model.ipynb) — notebook with first steps to build topic model by consequently tuning its hyperparameters
8+
9+
3. [Visualizing-Your-Model-Documents](Visualizing-Your-Model-Documents.ipynb) — notebook providing a fresh outlook on unstructured document collection with the help of a topic model
10+
11+
4. [20NG-Preprocessing](20NG-Preprocessing.ipynb) — preparing data from a well-know 20 Newsgroups dataset
512

6-
2. [RTL-Wiki-Building-Topic-Mode](RTL-Wiki-Building-Topic-Model.ipynb) -- notebook with first steps to build topic model by consequently tuning its hyperparameters
13+
5. [20NG-GenSim-vs-TopicNet](20NG-GenSim-vs-TopicNet.ipynb) — a comparison between two topic models build by Gensim and TopicNet library. In the notebook we compare model topics by calculating their UMass coherence measure with the help of [Palmetto](https://palmetto.demos.dice-research.org/) and using Jaccard measure to compare topic top-tokens diversity
714

8-
3. [Visualizing-Your-Model-Documents](Visualizing-Your-Model-Documents.ipynb) -- notebook providing a fresh outlook on unstructured document collection with the help of a topic model
15+
6. [PostNauka-Building-Topic-Model](PostNauka-Building-Topic-Model.ipynb) — an analog of the RTL-Wiki notebook performed on the corpus of Russian pop-science articles given by postnauka.ru
916

10-
4. [20NG-Preprocessing](20NG-Preprocessing.ipynb) -- preparing data from a well-know 20 Newsgroups dataset
17+
7. [PostNauka-Recipe](PostNauka-Recipe.ipynb) — a demonstration of rapid-prototyping methods provided by the library
1118

12-
5. [20NG-GenSim-vs-TopicNet](20NG-GenSim-vs-TopicNet.ipynb) -- a comparison between two topic models build by Gensim and TopicNet library. In the notebook we compare model topics by calculating their UMass coherence measure with the help of [Palmetto](https://palmetto.demos.dice-research.org/) and using Jaccard measure to compare topic top-tokens diversity
19+
8. [Coherence-Maximization-Recipe](Coherence-Maximization-Recipe.ipynb) a recipe for hyperparameter search in regard to custom Coherence metric
1320

14-
6. [PostNauka-Building-Topic-Model](PostNauka-Building-Topic-Model.ipynb)-- an analog of the RTL-Wiki notebook performed on the corpus of Russian pop-science articles given by postnauka.ru
21+
9. [Topic-Prior-Regularizer-Tutorial](Topic-Prior-Regularizer-Tutorial.ipynb) — a demonstration of the approach to learning topics from the unbalanced corpus
1522

16-
7. [PostNauka-Recipe](PostNauka-Recipe.ipynb) -- a demonstration of rapid-prototyping methods provided by the library
23+
10. [Making-Decorrelation-and-Topic-Selection-Friends](Making-Decorrelation-and-Topic-Selection-Friends.ipynb) — reproduction of a very complicated experiment on automatically learning optimal number of topics from the collection. Hurdle is — both needed regularizers when working together nullify token-topic matrix. Also, the notebook contains an in-depth examination of 20 NewsGroups corpus.
1724

18-
8. [Coherence-Maximization-Recipe](Coherence-Maximization-Recipe.ipynb) -- a recipe for hyperparameter search in regard to custom Coherence metric
25+
## References
1926

20-
9. [Topic-Prior-Regularizer-Tutorial](Topic-Prior-Regularizer-Tutorial.ipynb) -- a demonstration of the approach to learning topics from the unbalanced corpus
27+
[1]: https://dl.acm.org/doi/10.5555/2984093.2984126
2128

22-
10. [Making-Decorrelation-and-Topic-Selection-Friends](Making-Decorrelation-and-Topic-Selection-Friends.ipynb) -- reproduction of a very complicated experiment on automatically learning optimal number of topics from the collection. Hurdle is -- both needed regularizers when working together nullify token-topic matrix.
29+
[1][1] Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Chong Wang, and David M. Blei. 2009. Reading tea leaves: how humans interpret topic models. In Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS’09). Curran Associates Inc., Red Hook, NY, USA, 288–296.
2330

24-
----
25-
[1](https://dl.acm.org/doi/10.5555/2984093.2984126) Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Chong Wang, and David M. Blei. 2009. Reading tea leaves: how humans interpret topic models. In Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS’09). Curran Associates Inc., Red Hook, NY, USA, 288–296.
31+
## P.S.
2632

27-
----
28-
P.S. All the guides are supposed to contain **working** examples of the library code.
29-
If you happen to find code that is no longer works, please write about it in the library issues.
33+
All the guides are supposed to contain **working** examples of the library code.
34+
If you happen to find code that is no longer works, please write about it in the library issues!
3035
We will try to resolve it as soon as possible and plan to include fixes in the nearest releases.

0 commit comments

Comments
 (0)