Skip to content

Commit a18de8d

Browse files
committed
Merge branch 'release-1.0.0rc1'
2 parents 5d47ec4 + d0880f5 commit a18de8d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+10263
-474
lines changed

.travis.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
sudo: false
2+
dist: trusty
23
language: python
34
python:
45
- "2.6"
56
- "2.7"
67
- "3.3"
78
- "3.4"
89
- "3.5"
10+
- "3.6"
911
before_install:
1012
- wget 'http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh' -O miniconda.sh
1113
- chmod +x miniconda.sh

CHANGELOG.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,45 @@ Changes
33

44
Unreleased:
55

6-
None
6+
7+
1.0.0RC1, 2017-01-31
8+
9+
New features:
10+
* Add Author-topic modeling (@olavurmortensen,[#893](https://github.com/RaRe-Technologies/gensim/pull/893))
11+
* Add FastText word embedding wrapper (@Jayantj,[#847](https://github.com/RaRe-Technologies/gensim/pull/847))
12+
* Add WordRank word embedding wrapper (@parulsethi,[#1066](https://github.com/RaRe-Technologies/gensim/pull/1066), [#1125](https://github.com/RaRe-Technologies/gensim/pull/1125))
13+
* Add sklearn wrapper for LDAModel (@AadityaJ,[#932](https://github.com/RaRe-Technologies/gensim/pull/932))
14+
15+
Improvements:
16+
* Python 3.6 support (@tmylk [#1077](https://github.com/RaRe-Technologies/gensim/pull/1077))
17+
* Phrases and Phraser allow a generator corpus (ELind77 [#1099](https://github.com/RaRe-Technologies/gensim/pull/1099))
18+
* Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze,[#1053](https://github.com/RaRe-Technologies/gensim/pull/1053))
19+
* Move load and save word2vec_format out of word2vec class to KeyedVectors (@tmylk,[#1107](https://github.com/RaRe-Technologies/gensim/pull/1107))
20+
* Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel,[#1103](https://github.com/RaRe-Technologies/gensim/pull/1103)
21+
* Fix broken link to paper in readme (@bhargavvader,[#1101](https://github.com/RaRe-Technologies/gensim/pull/1101))
22+
* Lazy formatting in evaluate_word_pairs (@akutuzov,[#1084](https://github.com/RaRe-Technologies/gensim/pull/1084))
23+
* Deacc option to keywords pre-processing (@bhargavvader,[#1076](https://github.com/RaRe-Technologies/gensim/pull/1076))
24+
25+
Tutorial and doc improvements:
26+
27+
* Clarifying comment in is_corpus func in utils.py (@greninja,[#1109](https://github.com/RaRe-Technologies/gensim/pull/1109))
28+
* Tutorial Topics_and_Transformations fix markdown and add references (@lgmoneda,[#1120](https://github.com/RaRe-Technologies/gensim/pull/1120))
29+
* Fix doc2vec-lee.ipynb results to match previous behavior (@bahbbc,[#1119](https://github.com/RaRe-Technologies/gensim/pull/1119))
30+
* Remove Pattern lib dependency in News Classification tutorial (@luizcavalcanti,[#1118](https://github.com/RaRe-Technologies/gensim/pull/1118))
31+
* Corpora_and_Vector_Spaces tutorial text clarification (@lgmoneda,[#1116](https://github.com/RaRe-Technologies/gensim/pull/1116))
32+
* Update Transformation and Topics link from quick start notebook (@mariana393,[#1115](https://github.com/RaRe-Technologies/gensim/pull/1115))
33+
* Quick Start Text clarification and typo correction (@luizcavalcanti,[#1114](https://github.com/RaRe-Technologies/gensim/pull/1114))
34+
* Fix typos in Author-topic tutorial (@Fil,[#1102](https://github.com/RaRe-Technologies/gensim/pull/1102))
35+
* Address benchmark inconsistencies in Annoy tutorial (@droudy,[#1113](https://github.com/RaRe-Technologies/gensim/pull/1113))
36+
37+
38+
0.13.4.1, 2017-01-04
39+
40+
* Disable direct access warnings on save and load of Word2vec/Doc2vec (@tmylk, [#1072](https://github.com/RaRe-Technologies/gensim/pull/1072))
41+
* Making Default hs error explicit (@accraze, [#1054](https://github.com/RaRe-Technologies/gensim/pull/1054))
42+
* Removed unnecessary numpy imports (@bhargavvader, [#1065](https://github.com/RaRe-Technologies/gensim/pull/1065))
43+
* Utils and Matutils changes (@bhargavvader, [#1062](https://github.com/RaRe-Technologies/gensim/pull/1062))
44+
* Tests for the evaluate_word_pairs function (@akutuzov, [#1061](https://github.com/RaRe-Technologies/gensim/pull/1061))
745

846
0.13.4, 2016-12-22
947

README.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
gensim – Topic Modelling in Python
22
==================================
33

4-
[![Build Status](https://travis-ci.org/RaRe-Technologies/gensim.svg?branch=develop)](https://travis-ci.org/RaRe-Technologies/gensim)[![GitHub release](https://img.shields.io/github/release/rare-technologies/gensim.svg?maxAge=2592000)]()![Wheel]
4+
[![Build Status](https://travis-ci.org/RaRe-Technologies/gensim.svg?branch=develop)](https://travis-ci.org/RaRe-Technologies/gensim)[![GitHub release](https://img.shields.io/github/release/rare-technologies/gensim.svg?maxAge=2592000)]()[![Wheel](https://img.shields.io/pypi/wheel/gensim.svg)](https://pypi.python.org/pypi/gensim)
5+
[![Mailing List](https://img.shields.io/badge/-Mailing%20List-lightgrey.svg)](https://groups.google.com/forum/#!forum/gensim)
6+
[![Gitter](https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-09a3d5.svg)](https://gitter.im/RaRe-Technologies/gensim)
7+
[![Follow](https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow)](https://twitter.com/gensim_py)
8+
9+
10+
511

612
Gensim is a Python library for *topic modelling*, *document indexing*
713
and *similarity retrieval* with large corpora. Target audience is the
@@ -72,7 +78,7 @@ For alternative modes of installation (without root privileges,
7278
development installation, optional install features), see the
7379
[documentation].
7480

75-
This version has been tested under Python 2.6, 2.7, 3.3, 3.4 and 3.5
81+
This version has been tested under Python 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6
7682
(support for Python 2.5 was dropped in gensim 0.10.0; install gensim
7783
0.9.1 if you *must* use Python 2.5). Gensim’s github repo is hooked
7884
against [Travis CI for automated testing] on every commit push and pull
@@ -122,7 +128,7 @@ Adopters
122128
| Sports Authority | <img src="https://upload.wikimedia.org/wikipedia/commons/6/6c/Sports_Authority_logo2011.jpg" width="100"> | [sportsauthority.com](https://en.wikipedia.org/wiki/Sports_Authority)| Text mining of customer surveys and social media sources |
123129
| Search Metrics | <img src="http://www.searchmetrics.com/wp-content/uploads/Logo_searchmetrics_Webversion.png" width="100"> | [searchmetrics.com](http://www.searchmetrics.com/)| Gensim word2vec used for entity disambiguation in Search Engine Optimisation
124130
| Cisco Security | <img src="https://supportforums.cisco.com/sites/default/files/legacy/1/6/1/2161-CiscoSystems.gif" width="100"> | [cisco.com](http://www.cisco.com/c/en/us/products/security/index.html)| Large-scale fraud detection
125-
| 12K Research | <img src="https://techberlin.com/media/CACHE/images/threesixty/FtA5ANuJ/1aa5b8517ec65ef1c7d69c9bda5f9a3c.jpg" width="100"> | [12k.co](https://12k.co/)| Document similarity analysis on media articles
131+
| 12K Research | <img src="https://static1.squarespace.com/static/548d6f40e4b0fb61d7b8f40b/t/57310800b09f95e472ba5dd1/1462831123953/12k-logo.png" width="100"> | [12k.co](https://12k.co/)| Document similarity analysis on media articles
126132
| National Institutes of Health | <img src="https://www.nih.gov/sites/default/files/styles/featured_media_breakpoint-large/public/about-nih/2012-logo.png" width="100"> | [github/NIHOPA](https://github.com/NIHOPA/pipeline_word2vec)| Processing grants and publications with word2vec
127133
| Codeq LLC | <img src="https://codeq.com/wp-content/themes/codeq/assets/img/logo.svg" width="100"> | [codeq.com](https://codeq.com)| Document classification with word2vec
128134
| Mass Cognition | <img src="http://static1.squarespace.com/static/5637b16ee4b050255657c537/t/56a683bf9cadb6bf86a0ea13/1461016648294/?format=1500w" width="100"> | [masscognition.com](http://www.masscognition.com/) | Topic analysis service for consumer text data and general text data |
@@ -153,7 +159,7 @@ BibTeX entry:
153159
language={English}
154160
}
155161

156-
[citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC
162+
[citing gensim in academic papers and theses]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:NaGl4SEjCO4C
157163

158164
[Travis CI for automated testing]: https://travis-ci.org/RaRe-Technologies/gensim
159165
[design goals]: http://radimrehurek.com/gensim/about.html
@@ -163,7 +169,7 @@ BibTeX entry:
163169
[citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC
164170

165171

166-
[Wheel]: https://img.shields.io/pypi/wheel/gensim.svg
172+
167173
[documentation and Jupyter Notebook tutorials]: https://github.com/RaRe-Technologies/gensim/#documentation
168174
[Vector Space Model]: http://en.wikipedia.org/wiki/Vector_space_model
169175
[unsupervised document analysis]: http://en.wikipedia.org/wiki/Latent_semantic_indexing

docs/notebooks/Corpora_and_Vector_Spaces.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,7 @@
213213
"cell_type": "markdown",
214214
"metadata": {},
215215
"source": [
216-
"The function `doc2bow()` simply counts the number of occurrences of each distinct word, converts the word to its integer word id and returns the result as a sparse vector. The sparse vector `[(0, 1), (1, 1)]` therefore reads: in the document *“Human computer interaction”*, the words computer (id 0) and human (id 1) appear once; the other ten dictionary words appear (implicitly) zero times."
216+
"The function `doc2bow()` simply counts the number of occurrences of each distinct word, converts the word to its integer word id and returns the result as a sparse vector. The sparse vector `[(word_id, 1), (word_id, 1)]` therefore reads: in the document *“Human computer interaction”*, the words *\"computer\"* and *\"human\"*, identified by an integer id given by the built dictionary, appear once; the other ten dictionary words appear (implicitly) zero times. Check their id at the dictionary displayed in the previous cell and see that they match."
217217
]
218218
},
219219
{
@@ -250,7 +250,7 @@
250250
"cell_type": "markdown",
251251
"metadata": {},
252252
"source": [
253-
"By now it should be clear that the vector feature with `id=10 stands` for the question “How many times does the word graph appear in the document?” and that the answer is “zero” for the first six documents and “one” for the remaining three. As a matter of fact, we have arrived at exactly the same corpus of vectors as in the [Quick Example](https://radimrehurek.com/gensim/tutorial.html#first-example).\n",
253+
"By now it should be clear that the vector feature with `id=10 stands` for the question “How many times does the word graph appear in the document?” and that the answer is “zero” for the first six documents and “one” for the remaining three. As a matter of fact, we have arrived at exactly the same corpus of vectors as in the [Quick Example](https://radimrehurek.com/gensim/tutorial.html#first-example). If you're running this notebook by your own, the words id may differ, but you should be able to check the consistency between documents comparing their vectors. \n",
254254
"\n",
255255
"## Corpus Streaming – One Document at a Time\n",
256256
"\n",
@@ -616,7 +616,7 @@
616616
"name": "python",
617617
"nbconvert_exporter": "python",
618618
"pygments_lexer": "ipython3",
619-
"version": "3.5.1"
619+
"version": "3.6.0"
620620
}
621621
},
622622
"nbformat": 4,

0 commit comments

Comments
 (0)