Skip to content

Commit f46d72a

Browse files
committed
Merge branch 'release-4.0.0'
2 parents 4a241f0 + bae3359 commit f46d72a

File tree

22 files changed

+714
-371
lines changed

22 files changed

+714
-371
lines changed

CHANGELOG.md

Lines changed: 116 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,120 @@
11
Changes
22
=======
33

4+
## 4.0.0, 2021-03-24
5+
6+
**⚠️ Gensim 4.0 contains breaking API changes! See the [Migration guide](https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4) to update your existing Gensim 3.x code and models.**
7+
8+
Gensim 4.0 is a major release with lots of performance & robustness improvements, and a new website.
9+
10+
### Main highlights
11+
12+
* Massively optimized popular algorithms the community has grown to love: [fastText](https://radimrehurek.com/gensim/models/fasttext.html), [word2vec](https://radimrehurek.com/gensim/models/word2vec.html), [doc2vec](https://radimrehurek.com/gensim/models/doc2vec.html), [phrases](https://radimrehurek.com/gensim/models/phrases.html):
13+
14+
a. **Efficiency**
15+
16+
| model | 3.8.3: wall time / peak RAM / throughput | 4.0.0: wall time / peak RAM / throughput |
17+
|----------|------------|--------|
18+
| fastText | 2.9h / 4.11 GB / 822k words/s | 2.3h / **1.26 GB** / 914k words/s |
19+
| word2vec | 1.7h / 0.36 GB / 1685k words/s | **1.2h** / 0.33 GB / 1762k words/s |
20+
21+
In other words, fastText now needs 3x less RAM (and is faster); word2vec has 2x faster init (and needs less RAM, and is faster); detecting collocation phrases is 2x faster. ([4.0 benchmarks](https://github.com/RaRe-Technologies/gensim/issues/2887#issuecomment-711097334))
22+
23+
b. **Robustness**. We fixed a bunch of long-standing bugs by refactoring the internal code structure (see 🔴 Bug fixes below)
24+
25+
c. **Simplified OOP model** for easier model exports and integration with TensorFlow, PyTorch &co.
26+
27+
These improvements come to you transparently aka "for free", but see [Migration guide](https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4) for some changes that break the old Gensim 3.x API. **Update your code accordingly**.
28+
29+
* Dropped a bunch of externally contributed modules and wrappers: summarization, pivoted TFIDF, Mallet…
30+
- Code quality was not up to our standards. Also there was no one to maintain these modules, answer user questions, support them.
31+
32+
So rather than let them rot, we took the hard decision of removing these contributed modules from Gensim. If anyone's interested in maintaining them, please fork & publish into your own repo. They can live happily outside of Gensim.
33+
34+
* Dropped Python 2. Gensim 4.0 is Py3.6+. Read our [Python version support policy](https://github.com/RaRe-Technologies/gensim/wiki/Gensim-And-Compatibility).
35+
- If you still need Python 2 for some reason, stay at [Gensim 3.8.3](https://github.com/RaRe-Technologies/gensim/releases/tag/3.8.3).
36+
37+
* A new [Gensim website](https://radimrehurek.com/gensim) – finally! 🙃
38+
39+
So, a major clean-up release overall. We're happy with this **tighter, leaner and faster Gensim**.
40+
41+
This is the direction we'll keep going forward: less kitchen-sink of "latest academic algorithms", more focus on robust engineering, targetting concrete NLP & document similarity use-cases.
42+
43+
### :+1: New features
44+
45+
* [#2947](https://github.com/RaRe-Technologies/gensim/pull/2947): Bump minimum Python version to 3.6, by [@gojomo](https://github.com/gojomo)
46+
* [#2300](https://github.com/RaRe-Technologies/gensim/pull/2300): Use less RAM in LdaMulticore, by [@horpto](https://github.com/horpto)
47+
* [#2698](https://github.com/RaRe-Technologies/gensim/pull/2698): Streamline KeyedVectors & X2Vec API, by [@gojomo](https://github.com/gojomo)
48+
* [#2864](https://github.com/RaRe-Technologies/gensim/pull/2864): Speed up random number generation in word2vec, by [@zygm0nt](https://github.com/zygm0nt)
49+
* [#2976](https://github.com/RaRe-Technologies/gensim/pull/2976): Speed up phrase (collocation) detection, by [@piskvorky](https://github.com/piskvorky)
50+
* [#2979](https://github.com/RaRe-Technologies/gensim/pull/2979): Allow skipping common English words in multi-word phrases, by [@piskvorky](https://github.com/piskvorky)
51+
* [#2867](https://github.com/RaRe-Technologies/gensim/pull/2867): Expose `max_final_vocab` parameter in fastText constructor, by [@mpenkov](https://github.com/mpenkov)
52+
* [#2931](https://github.com/RaRe-Technologies/gensim/pull/2931): Clear up job queue parameters in word2vec, by [@lunastera](https://github.com/lunastera)
53+
* [#2939](https://github.com/RaRe-Technologies/gensim/pull/2939): X2Vec SaveLoad improvements, by [@piskvorky](https://github.com/piskvorky)
54+
* [#3060](https://github.com/RaRe-Technologies/gensim/pull/3060): Record lifecycle events in Gensim models, by [@piskvorky](https://github.com/piskvorky)
55+
* [#3073](https://github.com/RaRe-Technologies/gensim/pull/3073): Make WMD normalization optional, by [@piskvorky](https://github.com/piskvorky)
56+
* [#3065](https://github.com/RaRe-Technologies/gensim/pull/3065): Default to pickle protocol 4 when saving models, by [@piskvorky](https://github.com/piskvorky)
57+
* [#3069](https://github.com/RaRe-Technologies/gensim/pull/3069): Add Github sponsor + donation nags, by [@piskvorky](https://github.com/piskvorky)
58+
59+
### :books: Tutorials and docs
60+
61+
* [#3082](https://github.com/RaRe-Technologies/gensim/pull/3082): Make LDA tutorial read NIPS data on the fly, by [@jonaschn](https://github.com/jonaschn)
62+
* [#2954](https://github.com/RaRe-Technologies/gensim/pull/2954): New theme for the Gensin website, by [@dvorakvaclav](https://github.com/dvorakvaclav)
63+
* [#2960](https://github.com/RaRe-Technologies/gensim/issues/2960): Added [Gensim and Compatibility](https://github.com/RaRe-Technologies/gensim/wiki/Gensim-And-Compatibility) Wiki page, by [@piskvorky](https://github.com/piskvorky)
64+
* [#2960](https://github.com/RaRe-Technologies/gensim/issues/2960): Reworked & simplified the [Developer Wiki page](https://github.com/RaRe-Technologies/gensim/wiki/Developer-page), by [@piskvorky](https://github.com/piskvorky)
65+
* [#2968](https://github.com/RaRe-Technologies/gensim/pull/2968): Migrate tutorials & how-tos to 4.0.0, by [@piskvorky](https://github.com/piskvorky)
66+
* [#2899](https://github.com/RaRe-Technologies/gensim/pull/2899): Clean up of language and formatting of docstrings, by [@piskvorky](https://github.com/piskvorky)
67+
* [#2899](https://github.com/RaRe-Technologies/gensim/pull/2899): Added documentation for NMSLIB indexer, by [@piskvorky](https://github.com/piskvorky)
68+
* [#2832](https://github.com/RaRe-Technologies/gensim/pull/2832): Clear up LdaModel documentation, by [@FyzHsn](https://github.com/FyzHsn)
69+
* [#2871](https://github.com/RaRe-Technologies/gensim/pull/2871): Clarify that license is LGPL-2.1, by [@pombredanne](https://github.com/pombredanne)
70+
* [#2896](https://github.com/RaRe-Technologies/gensim/pull/2896): Make docs clearer on `alpha` parameter in LDA model, by [@xh2](https://github.com/xh2)
71+
* [#2897](https://github.com/RaRe-Technologies/gensim/pull/2897): Update Hoffman paper link for Online LDA, by [@xh2](https://github.com/xh2)
72+
* [#2910](https://github.com/RaRe-Technologies/gensim/pull/2910): Refresh docs for run_annoy tutorial, by [@piskvorky](https://github.com/piskvorky)
73+
* [#2935](https://github.com/RaRe-Technologies/gensim/pull/2935): Fix "generator" language in word2vec docs, by [@polm](https://github.com/polm)
74+
* [#3077](https://github.com/RaRe-Technologies/gensim/pull/3077): Fix various documentation warnings, by [@mpenkov](https://github.com/mpenkov)
75+
* [#2991](https://github.com/RaRe-Technologies/gensim/pull/2991): Fix broken link in run_doc How-To, by [@sezanzeb](https://github.com/sezanzeb)
76+
* [#3003](https://github.com/RaRe-Technologies/gensim/pull/3003): Point WordEmbeddingSimilarityIndex documentation to gensim.similarities, by [@Witiko](https://github.com/Witiko)
77+
* [#2996](https://github.com/RaRe-Technologies/gensim/pull/2996): Make the website link to the old Gensim 3.8.3 documentation dynamic, by [@Witiko](https://github.com/Witiko)
78+
* [#3063](https://github.com/RaRe-Technologies/gensim/pull/3063): Update link to papers in LSI model, by [@jonaschn](https://github.com/jonaschn)
79+
* [#3080](https://github.com/RaRe-Technologies/gensim/pull/3080): Fix some of the warnings/deprecated functions, by [@FredHappyface](https://github.com/FredHappyface))
80+
81+
### :red_circle: Bug fixes
82+
83+
* [#2891](https://github.com/RaRe-Technologies/gensim/pull/2891): Fix fastText word-vectors with ngrams off, by [@gojomo](https://github.com/gojomo)
84+
* [#2907](https://github.com/RaRe-Technologies/gensim/pull/2907): Fix doc2vec crash for large sets of doc-vectors, by [@gojomo](https://github.com/gojomo)
85+
* [#2899](https://github.com/RaRe-Technologies/gensim/pull/2899): Fix similarity bug in NMSLIB indexer, by [@piskvorky](https://github.com/piskvorky)
86+
* [#2899](https://github.com/RaRe-Technologies/gensim/pull/2899): Fix deprecation warnings in Annoy integration, by [@piskvorky](https://github.com/piskvorky)
87+
* [#2901](https://github.com/RaRe-Technologies/gensim/pull/2901): Fix inheritance of WikiCorpus from TextCorpus, by [@jenishah](https://github.com/jenishah)
88+
* [#2940](https://github.com/RaRe-Technologies/gensim/pull/2940): Fix deprecations in SoftCosineSimilarity, by [@Witiko](https://github.com/Witiko)
89+
* [#2944](https://github.com/RaRe-Technologies/gensim/pull/2944): Fix `save_facebook_model` failure after update-vocab & other initialization streamlining, by [@gojomo](https://github.com/gojomo)
90+
* [#2846](https://github.com/RaRe-Technologies/gensim/pull/2846): Fix for Python 3.9/3.10: remove `xml.etree.cElementTree`, by [@hugovk](https://github.com/hugovk)
91+
* [#2973](https://github.com/RaRe-Technologies/gensim/issues/2973): phrases.export_phrases() doesn't yield all bigrams, by [@piskvorky](https://github.com/piskvorky)
92+
* [#2942](https://github.com/RaRe-Technologies/gensim/issues/2942): Segfault when training doc2vec, by [@gojomo](https://github.com/gojomo)
93+
* [#3041](https://github.com/RaRe-Technologies/gensim/pull/3041): Fix RuntimeError in export_phrases (change defaultdict to dict), by [@thalishsajeed](https://github.com/thalishsajeed)
94+
* [#3059](https://github.com/RaRe-Technologies/gensim/pull/3059): Fix race condition in FastText tests, by [@sleepy-owl](https://github.com/sleepy-owl)
95+
96+
### :warning: Removed functionality & deprecations
97+
98+
* Removed all code, methods, attributes and functions marked as deprecated in [Gensim 3.8.3](https://github.com/RaRe-Technologies/gensim/releases/tag/3.8.3).
99+
* [#6](https://github.com/RaRe-Technologies/gensim-wheels/pull/6): No more binary wheels for x32 platforms, by [@menshikh-iv](https://github.com/menshikh-iv)
100+
* [#2899](https://github.com/RaRe-Technologies/gensim/pull/2899): Renamed overly broad `similarities.index` to the more appropriate `similarities.annoy`, by [@piskvorky](https://github.com/piskvorky)
101+
* [#2958](https://github.com/RaRe-Technologies/gensim/pull/2958): Remove gensim.summarization subpackage, docs and test data, by [@mpenkov](https://github.com/mpenkov)
102+
* [#2926](https://github.com/RaRe-Technologies/gensim/pull/2926): Rename `num_words` to `topn` in dtm_coherence, by [@MeganStodel](https://github.com/MeganStodel)
103+
* [#2937](https://github.com/RaRe-Technologies/gensim/pull/2937): Remove Keras dependency, by [@piskvorky](https://github.com/piskvorky)
104+
* [#3078](https://github.com/RaRe-Technologies/gensim/pull/3078): Remove `on_batch_begin` and `on_batch_end` callbacks, by [@mpenkov](https://github.com/mpenkov)
105+
* [#3012](https://github.com/RaRe-Technologies/gensim/pull/3012): Remove `pattern` dependency, by [@mpenkov](https://github.com/mpenkov)
106+
* [#3055](https://github.com/RaRe-Technologies/gensim/pull/3055): Remove `gensim.viz` subpackage, by [@mpenkov](https://github.com/mpenkov)
107+
108+
### 🔮 Testing, CI, housekeeping
109+
110+
* [#2939](https://github.com/RaRe-Technologies/gensim/pull/2939) + [#2984](https://github.com/RaRe-Technologies/gensim/pull/2984): Code style & py3 migration clean up, by [@piskvorky](https://github.com/piskvorky)
111+
* [#3058](https://github.com/RaRe-Technologies/gensim/pull/3058): Add py39 wheels to Travis/Azure, by [@FredHappyface](https://github.com/FredHappyface)
112+
* [#3035](https://github.com/RaRe-Technologies/gensim/pull/3035): Update repos before trying to install gdb, by [@janaknat](https://github.com/janaknat)
113+
* [#3026](https://github.com/RaRe-Technologies/gensim/pull/3026): Move x86 tests from Travis to GHA, add aarch64 wheel build to Travis, by [@janaknat](https://github.com/janaknat)
114+
* [#3033](https://github.com/RaRe-Technologies/gensim/pull/3033): Transformed camelCase to snake_case test names, by [@sezanzeb](https://github.com/sezanzeb)
115+
* [#3024](https://github.com/RaRe-Technologies/gensim/pull/3024): Add Github Actions x86 and mac jobs to build python wheels, by [@janaknat](https://github.com/janaknat)
116+
117+
4118
## 4.0.0.rc1, 2021-03-19
5119

6120
**⚠️ Gensim 4.0 contains breaking API changes! See the [Migration guide](https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4) to update your existing Gensim 3.x code and models.**
@@ -57,21 +171,13 @@ This is the direction we'll keep going forward: less kitchen-sink of "latest aca
57171
* Point WordEmbeddingSimilarityIndex documentation to gensim.similarities (__[Witiko](https://github.com/Witiko)__, [#3003](https://github.com/RaRe-Technologies/gensim/pull/3003))
58172
* Make the link to the Gensim 3.8.3 documentation dynamic (__[Witiko](https://github.com/Witiko)__, [#2996](https://github.com/RaRe-Technologies/gensim/pull/2996))
59173

60-
### :+1: Improvements
61-
62174
### :warning: Removed functionality
63175

64176
* remove on_batch_begin and on_batch_end callbacks (__[mpenkov](https://github.com/mpenkov)__, [#3078](https://github.com/RaRe-Technologies/gensim/pull/3078))
65177
* remove pattern dependency (__[mpenkov](https://github.com/mpenkov)__, [#3012](https://github.com/RaRe-Technologies/gensim/pull/3012))
66178
* rm gensim.viz submodule (__[mpenkov](https://github.com/mpenkov)__, [#3055](https://github.com/RaRe-Technologies/gensim/pull/3055))
67179

68-
### :warning: Deprecations (will be removed in the next major release)
69-
70-
### ??? Misc
71-
72-
**FIXME** This is a list of PRs that I couldn't find an appropriate section for.
73-
We could make some other section for them or remove them from the changelog entirely.
74-
This is probably OK as-is for the release candidate, but we should clean this up for the proper, final release.
180+
### 🔮 Miscellaneous
75181

76182
* [MRG] Add Github sponsor + donation nags (__[piskvorky](https://github.com/piskvorky)__, [#3069](https://github.com/RaRe-Technologies/gensim/pull/3069))
77183
* Update URLs (__[jonaschn](https://github.com/jonaschn)__, [#3063](https://github.com/RaRe-Technologies/gensim/pull/3063))
@@ -82,6 +188,7 @@ This is probably OK as-is for the release candidate, but we should clean this up
82188
* move x86 tests from Travis to GHA, add aarch64 wheel build to Travis (__[janaknat](https://github.com/janaknat)__, [#3026](https://github.com/RaRe-Technologies/gensim/pull/3026))
83189
* Add Github Actions x86 and mac jobs to build python wheels (__[janaknat](https://github.com/janaknat)__, [#3024](https://github.com/RaRe-Technologies/gensim/pull/3024))
84190

191+
85192
## 4.0.0beta, 2020-10-31
86193

87194
**⚠️ Gensim 4.0 contains breaking API changes! See the [Migration guide](https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4) to update your existing Gensim 3.x code and models.**

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ https://github.com/RaRe-Technologies/gensim/issues/2805
99

1010
[![Build Status](https://travis-ci.org/RaRe-Technologies/gensim.svg?branch=develop)](https://travis-ci.org/RaRe-Technologies/gensim)
1111
[![GitHub release](https://img.shields.io/github/release/rare-technologies/gensim.svg?maxAge=3600)](https://github.com/RaRe-Technologies/gensim/releases)
12-
[![Downloads](https://img.shields.io/pypi/dm/gensim?color=blue)](https://pepy.tech/project/gensim/month)
12+
[![Downloads](https://img.shields.io/pypi/dm/gensim?color=blue)](https://pepy.tech/project/gensim/)
1313
[![DOI](https://zenodo.org/badge/DOI/10.13140/2.1.2393.1847.svg)](https://doi.org/10.13140/2.1.2393.1847)
1414
[![Mailing List](https://img.shields.io/badge/-Mailing%20List-blue.svg)](https://groups.google.com/forum/#!forum/gensim)
1515
[![Follow](https://img.shields.io/twitter/follow/gensim_py.svg?style=social&style=flat&logo=twitter&label=Follow&color=blue)](https://twitter.com/gensim_py)

0 commit comments

Comments
 (0)