You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gensim is a Python library for *topic modelling*, *document indexing*
7
13
and *similarity retrieval* with large corpora. Target audience is the
@@ -72,7 +78,7 @@ For alternative modes of installation (without root privileges,
72
78
development installation, optional install features), see the
73
79
[documentation].
74
80
75
-
This version has been tested under Python 2.6, 2.7, 3.3, 3.4and 3.5
81
+
This version has been tested under Python 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6
76
82
(support for Python 2.5 was dropped in gensim 0.10.0; install gensim
77
83
0.9.1 if you *must* use Python 2.5). Gensim’s github repo is hooked
78
84
against [Travis CI for automated testing] on every commit push and pull
@@ -122,7 +128,7 @@ Adopters
122
128
| Sports Authority | <imgsrc="https://upload.wikimedia.org/wikipedia/commons/6/6c/Sports_Authority_logo2011.jpg"width="100"> |[sportsauthority.com](https://en.wikipedia.org/wiki/Sports_Authority)| Text mining of customer surveys and social media sources |
123
129
| Search Metrics | <imgsrc="http://www.searchmetrics.com/wp-content/uploads/Logo_searchmetrics_Webversion.png"width="100"> | [searchmetrics.com](http://www.searchmetrics.com/)| Gensim word2vec used for entity disambiguation in Search Engine Optimisation
| 12K Research | <imgsrc="https://techberlin.com/media/CACHE/images/threesixty/FtA5ANuJ/1aa5b8517ec65ef1c7d69c9bda5f9a3c.jpg"width="100"> | [12k.co](https://12k.co/)| Document similarity analysis on media articles
131
+
| 12K Research | <imgsrc="https://static1.squarespace.com/static/548d6f40e4b0fb61d7b8f40b/t/57310800b09f95e472ba5dd1/1462831123953/12k-logo.png"width="100"> | [12k.co](https://12k.co/)| Document similarity analysis on media articles
126
132
| National Institutes of Health | <imgsrc="https://www.nih.gov/sites/default/files/styles/featured_media_breakpoint-large/public/about-nih/2012-logo.png"width="100"> | [github/NIHOPA](https://github.com/NIHOPA/pipeline_word2vec)| Processing grants and publications with word2vec
| Mass Cognition | <imgsrc="http://static1.squarespace.com/static/5637b16ee4b050255657c537/t/56a683bf9cadb6bf86a0ea13/1461016648294/?format=1500w"width="100"> |[masscognition.com](http://www.masscognition.com/)| Topic analysis service for consumer text data and general text data |
@@ -153,7 +159,7 @@ BibTeX entry:
153
159
language={English}
154
160
}
155
161
156
-
[citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC
162
+
[citing gensim in academic papers and theses]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:NaGl4SEjCO4C
157
163
158
164
[Travis CI for automated testing]: https://travis-ci.org/RaRe-Technologies/gensim
[citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC
Copy file name to clipboardExpand all lines: docs/notebooks/Corpora_and_Vector_Spaces.ipynb
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -213,7 +213,7 @@
213
213
"cell_type": "markdown",
214
214
"metadata": {},
215
215
"source": [
216
-
"The function `doc2bow()` simply counts the number of occurrences of each distinct word, converts the word to its integer word id and returns the result as a sparse vector. The sparse vector `[(0, 1), (1, 1)]` therefore reads: in the document *“Human computer interaction”*, the words computer (id 0) and human (id 1) appear once; the other ten dictionary words appear (implicitly) zero times."
216
+
"The function `doc2bow()` simply counts the number of occurrences of each distinct word, converts the word to its integer word id and returns the result as a sparse vector. The sparse vector `[(word_id, 1), (word_id, 1)]` therefore reads: in the document *“Human computer interaction”*, the words *\"computer\"* and *\"human\"*, identified by an integer id given by the built dictionary, appear once; the other ten dictionary words appear (implicitly) zero times. Check their id at the dictionary displayed in the previous cell and see that they match."
217
217
]
218
218
},
219
219
{
@@ -250,7 +250,7 @@
250
250
"cell_type": "markdown",
251
251
"metadata": {},
252
252
"source": [
253
-
"By now it should be clear that the vector feature with `id=10 stands` for the question “How many times does the word graph appear in the document?” and that the answer is “zero” for the first six documents and “one” for the remaining three. As a matter of fact, we have arrived at exactly the same corpus of vectors as in the [Quick Example](https://radimrehurek.com/gensim/tutorial.html#first-example).\n",
253
+
"By now it should be clear that the vector feature with `id=10 stands` for the question “How many times does the word graph appear in the document?” and that the answer is “zero” for the first six documents and “one” for the remaining three. As a matter of fact, we have arrived at exactly the same corpus of vectors as in the [Quick Example](https://radimrehurek.com/gensim/tutorial.html#first-example). If you're running this notebook by your own, the words id may differ, but you should be able to check the consistency between documents comparing their vectors. \n",
0 commit comments