piskvorky
diff --git a/‎CHANGELOG.md‎
Lines changed: 66 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 66 additions & 0 deletions
diff --git a/‎ISSUE_TEMPLATE.md‎
Lines changed: 14 additions & 33 deletions b/‎ISSUE_TEMPLATE.md‎
Lines changed: 14 additions & 33 deletions
diff --git a/‎MANIFEST.in‎
Lines changed: 1 addition & 0 deletions b/‎MANIFEST.in‎
Lines changed: 1 addition & 0 deletions
@@ -1,5 +1,71 @@
 Changes
 ===========
+
+## Unreleased
+
+### :star2: New Features
+
+- `gensim.models.fasttext.load_facebook_model` function: load full model (slower, more CPU/memory intensive, supports training continuation)
+- `gensim.models.fasttext.load_facebook_vectors` function: load embeddings only (faster, less CPU/memory usage, does not support training continuation)
+
+### :red_circle: Bug fixes
+
+* Fix unicode error when loading FastText vocabulary (__[@mpenkov](https://github.com/mpenkov)__, [#2390](https://github.com/RaRe-Technologies/gensim/pull/2390))
+* Avoid division by zero in fasttext_inner.pyx (__[@mpenkov](https://github.com/mpenkov)__, [#2404](https://github.com/RaRe-Technologies/gensim/pull/2404))
+* Avoid incorrect filename inference when loading model (__[@mpenkov](https://github.com/mpenkov)__, [#2408](https://github.com/RaRe-Technologies/gensim/pull/2408))
+* Handle invalid unicode when loading native FastText models (__[@mpenkov](https://github.com/mpenkov)__, [#2411](https://github.com/RaRe-Technologies/gensim/pull/2411))
+* Avoid divide by zero when calculating vectors for terms with no ngrams (__[@mpenkov](https://github.com/mpenkov)__, [#2411](https://github.com/RaRe-Technologies/gensim/pull/2411))
+
+### :books: Tutorial and doc improvements
+
+* Add link to bindr (__[rogueleaderr](https://github.com/rogueleaderr)__, [#2387](https://github.com/RaRe-Technologies/gensim/pull/2387))
+
+### :+1: Improvements
+
+* Undo the hash2index optimization (__[mpenkov](https://github.com/mpenkov)__, [#2370](https://github.com/RaRe-Technologies/gensim/pull/2387))
+
+### :warning: Changes in FastText behavior
+
+#### Out-of-vocab word handling
+
+To achieve consistency with the reference implementation from Facebook,
+a `FastText` model will now always report any word, out-of-vocabulary or 
+not, as being in the model,  and always return some vector for any word 
+looked-up. Specifically:
+
+1. `'any_word' in ft_model` will always return `True`.  Previously, it 
+returned `True` only if the full word was in the vocabulary. (To test if a 
+full word is in the known vocabulary, you can consult the `wv.vocab` 
+property: `'any_word' in ft_model.wv.vocab` will return `False` if the full 
+word wasn't learned during model training.)
+2. `ft_model['any_word']` will always return a vector.  Previously, it 
+raised `KeyError` for OOV words when the model had no vectors 
+for **any** ngrams of the word.
+3. If no ngrams from the term are present in the model,
+or when no ngrams could be extracted from the term, a vector pointing
+to the origin will be returned.  Previously, a vector of NaN (not a number)
+was returned as a consequence of a divide-by-zero problem.
+4. Models may use more more memory, or take longer for word-vector
+lookup, especially after training on smaller corpuses where the previous 
+non-compliant behavior discarded some ngrams from consideration.
+
+#### Loading models in Facebook .bin format
+
+The `gensim.models.FastText.load_fasttext_format` function (deprecated) now loads the entire model contained in the .bin file, including the shallow neural network that enables training continuation.
+Loading this NN requires more CPU and RAM than previously required.
+
+Since this function is deprecated, consider using one of its alternatives (see below).
+
+Furthermore, you must now pass the full path to the file to load, **including the file extension.**
+Previously, if you specified a model path that ends with anything other than .bin, the code automatically appended .bin to the path before loading the model.
+This behavior was [confusing](https://github.com/RaRe-Technologies/gensim/issues/2407), so we removed it.
+	
+### :warning: Deprecations (will be removed in the next major release)
+
+Remove:
+
+- `gensim.models.FastText.load_fasttext_format`: use load_facebook_vectors to load embeddings only (faster, less CPU/memory usage, does not support training continuation) and load_facebook_model to load full model (slower, more CPU/memory intensive, supports training continuation)
+
 ## 3.7.1, 2019-01-31
 
 ### :+1: Improvements
 
@@ -1,48 +1,29 @@
 <!--
-If your issue is a usage or a general question, please submit it here instead:
-- Mailing List: https://groups.google.com/forum/#!forum/gensim
-For more information, see Recipes&FAQ: https://github.com/RaRe-Technologies/gensim/wiki/Recipes-&-FAQ
--->
-
-<!-- Instructions For Filing a Bug: https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md -->
+**IMPORTANT**:
 
-#### Description
-TODO: change commented example
-<!-- Example: Vocabulary size is not what I expected when training Word2Vec. -->
-
-#### Steps/Code/Corpus to Reproduce
-<!--
-Example:
-```
-from gensim.models import word2vec
+- Use the [Gensim mailing list](https://groups.google.com/forum/#!forum/gensim) to ask general or usage questions. Github issues are only for bug reports.
+- Check [Recipes&FAQ](https://github.com/RaRe-Technologies/gensim/wiki/Recipes-&-FAQ) first for common answers.
 
-sentences = ['human', 'machine']
-model = word2vec.Word2Vec(sentences)
-print(model.syn0.shape) 
-```
-If the code is too long, feel free to put it in a public gist and link
-it in the issue: https://gist.github.com
+Github bug reports that do not include relevant information and context will be closed without an answer. Thanks!
 -->
 
-#### Expected Results
-<!-- Example: Expected shape of (100,2).-->
+#### Problem description
 
-#### Actual Results
-<!-- Example: Actual shape of (100,5). 
+What are you trying to achieve? What is the expected result? What are you seeing instead?
 
-Please paste or specifically describe the actual output or traceback. -->
+#### Steps/code/corpus to reproduce
+
+Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal ("minimal reproducible example").
 
 #### Versions
-<!--
-Please run the following snippet and paste the output below.
+
+Please provide the output of:
+
+```python
 import platform; print(platform.platform())
 import sys; print("Python", sys.version)
 import numpy; print("NumPy", numpy.__version__)
 import scipy; print("SciPy", scipy.__version__)
 import gensim; print("gensim", gensim.__version__)
 from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
--->
-
-
-<!-- Thanks for contributing! -->
-
+```
@@ -6,6 +6,7 @@ include COPYING.LESSER
 include ez_setup.py
 
 include gensim/models/voidptr.h
+include gensim/models/stdint_wrapper.h
 include gensim/models/fast_line_sentence.h
 
 include gensim/models/word2vec_inner.c