Docs 💬
3.5.0, 2018-07-06
This release comprises a glorious 38 pull requests from 28 contributors. Most of the effort went into improving the documentation—hence the release code name "Docs 💬"!
Apart from the massive overhaul of all Gensim documentation (including docstring style and examples—you asked for it), we also managed to sneak in some new functionality and a number of bug fixes. As usual, see the notes below for a complete list, with links to pull requests for more details.
Huge thanks to all contributors! Nobody loves working on documentation. 3.5.0 is a result of several months of laborious, unglamorous, and sometimes invisible work. Enjoy!
📚 Documentation improvements
- Overhaul documentation for
*2vecmodels (@steremma & @piskvorky & @menshikh-iv, #1944, #2087) - Fix documentation for LDA-related models (@steremma & @piskvorky & @menshikh-iv, #2026)
- Fix documentation for utils, corpora, inferfaces (@piskvorky & @menshikh-iv, #2096)
- Update non-API docs (about, intro, license etc) (@piskvorky & @menshikh-iv, #2101)
- Refactor documentation for
gensim.models.phrases(@CLearERR & @menshikh-iv, #1950) - Fix HashDictionary documentation (@piskvorky, #2073)
- Fix docstrings for
gensim.models.AuthorTopicModel(@souravsingh & @menshikh-iv, #1907) - Fix docstrings for HdpModel, lda_worker & lda_dispatcher (@gyanesh-m & @menshikh-iv, #1912)
- Fix format & links for
gensim.similarities.docsim(@CLearERR & @menshikh-iv, #2030) - Remove duplication of class documentation for
IndexedCorpus(@darindf, #2033) - Refactor documentation for
gensim.models.coherencemodel(@CLearERR & @menshikh-iv, #1933) - Fix docstrings for
gensim.sklearn_api(@steremma & @menshikh-iv, #1895) - Disable google-style docstring support (@menshikh-iv, #2106)
- Fix docstring of
gensim.models.KeyedVectors.similarity_matrix(@Witiko, #1971) - Consistently use
smart_open()instead ofopen()in notebooks (@sharanry, #1812)
🌟 New features:
- Add
add_entitymethod toKeyedVectorsto allow adding word vectors manually (@persiyanov, #1957) - Add inference for new unseen author to
AuthorTopicModel(@Stamenov, #1766) - Add
evaluate_word_analogies(will replaceaccuracy) method toKeyedVectors(@akutuzov, #1935) - Add Pivot Normalization to
TfidfModel(@markroxor, #1780)
👍 Improvements
- Allow initialization with
max_final_vocabin lieu ofmin_countinWord2Vec(@aneesh-joshi, #1915) - Add
dtypeargument forchunkize_serialinLdaModel(@darindf, #2027) - Increase performance in
Phrases.analyze_sentence(@JonathanHourany, #2070) - Add
ns_exponentparameter to control the negative sampling distribution for*2vecmodels (@fernandocamargoti, #2093)
🔴 Bug fixes:
- Fix
Doc2Vec.infer_vector+ notebook cleanup (@gojomo, #2103) - Fix linear decay for learning rate in
Doc2Vec.infer_vector(@umangv, #2063) - Fix negative sampling floating-point error for `gensim.models.Poincare (@jayantj, #1959)
- Fix loading
word2vecanddoc2vecmodels saved using old Gensim versions (@manneshiva, #2012) - Fix
SoftCosineSimilarity.get_similaritieson corpora ssues/1955) (@Witiko, #1972) - Fix return dtype for
matutils.unitvecaccording to input dtype (@o-P-o, #1992) - Fix passing empty dictionary to
gensim.corpora.WikiCorpus(@steremma, #2042) - Fix bug in
Similarity.query_shardsin multiprocessing case (@bohea, #2044) - Fix SMART from TfidfModel for case when
df == "n"(@PeteBleackley, #2021) - Fix OverflowError when loading a large term-document matrix in compiled MatrixMarket format (@arlenk, #2001)
- Update rules for removing table markup from Wikipedia dumps (@chaitaliSaini, #1954)
- Fix
_is_singlefromPhrasesfor case when corpus is a NumPy array (@rmalouf, #1987) - Fix tests for
EuclideanKeyedVectors.similarity_matrix(@Witiko, #1984) - Fix deprecated parameters in
D2VTransformerandW2VTransformer(@MritunjayMohitesh, #1945) - Fix
Doc2Vec.infer_vectorafter loading oldDoc2Vec(gensim<=3.2)(@manneshiva, #1974) - Fix inheritance chain for
load_word2vec_format(@DennisChen0307, #1968) - Update Keras version (avoid bug from
keras==2.1.5) (@menshikh-iv, #1963)
⚠️ Deprecations (will be removed in the next major release)
-
Remove
gensim.models.wrappers.fasttext(obsoleted by the new nativegensim.models.fasttextimplementation)gensim.examplesgensim.nosygensim.scripts.word2vec_standalonegensim.scripts.make_wiki_lemmagensim.scripts.make_wiki_onlinegensim.scripts.make_wiki_online_lemmagensim.scripts.make_wiki_online_nodebuggensim.scripts.make_wiki(all of these obsoleted by the new nativegensim.scripts.segment_wikiimplementation)- "deprecated" functions and attributes
-
Move
gensim.scripts.make_wikicorpus➡gensim.scripts.make_wiki.pygensim.summarization➡gensim.models.summarizationgensim.topic_coherence➡gensim.models._coherencegensim.utils➡gensim.utils.utils(old imports will continue to work)gensim.parsing.*➡gensim.utils.text_utils