Skip to content

Commit 351bdef

Browse files
committed
Merge branch 'release-3.0.0'
2 parents 8b8669d + af646c4 commit 351bdef

File tree

182 files changed

+23312
-4747
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

182 files changed

+23312
-4747
lines changed

.travis.yml

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,22 @@
11
sudo: false
2+
3+
cache:
4+
apt: true
5+
directories:
6+
- $HOME/.cache/pip
7+
- $HOME/.ccache
8+
29
dist: trusty
310
language: python
4-
python:
5-
- "2.7"
6-
- "3.5"
7-
- "3.6"
8-
before_install:
9-
- wget 'http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh' -O miniconda.sh
10-
- chmod +x miniconda.sh
11-
- ./miniconda.sh -b
12-
- export PATH=/home/travis/miniconda2/bin:$PATH
13-
- conda update --yes conda
14-
install:
15-
- conda create --yes -n gensim-test python=$TRAVIS_PYTHON_VERSION pip atlas numpy==1.11.3 scipy==0.18.1
16-
- source activate gensim-test
17-
- python setup.py install
18-
- pip install .[test]
19-
script:
20-
- pip freeze
21-
- python setup.py test
22-
- pip install flake8
23-
- continuous_integration/travis/flake8_diff.sh
11+
12+
13+
matrix:
14+
include:
15+
- env: PYTHON_VERSION="2.7" NUMPY_VERSION="1.11.3" SCIPY_VERSION="0.18.1" ONLY_CODESTYLE="yes"
16+
- env: PYTHON_VERSION="2.7" NUMPY_VERSION="1.11.3" SCIPY_VERSION="0.18.1" ONLY_CODESTYLE="no"
17+
- env: PYTHON_VERSION="3.5" NUMPY_VERSION="1.11.3" SCIPY_VERSION="0.18.1" ONLY_CODESTYLE="no"
18+
- env: PYTHON_VERSION="3.6" NUMPY_VERSION="1.11.3" SCIPY_VERSION="0.18.1" ONLY_CODESTYLE="no"
19+
20+
21+
install: source continuous_integration/travis/install.sh
22+
script: bash continuous_integration/travis/run.sh

CHANGELOG.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,54 @@
11
Changes
22
===========
3+
## 3.0.0, 2017-09-27
4+
5+
6+
:star2: New features:
7+
* Add unsupervised FastText to Gensim (@chinmayapancholi13, [#1525](https://github.com/RaRe-Technologies/gensim/pull/1525))
8+
* Add sklearn API for gensim models (@chinmayapancholi13, [#1462](https://github.com/RaRe-Technologies/gensim/pull/1462))
9+
* Add callback metrics for LdaModel and integration with Visdom (@parulsethi, [#1399](https://github.com/RaRe-Technologies/gensim/pull/1399))
10+
* Add TranslationMatrix model (@robotcator, [#1434](https://github.com/RaRe-Technologies/gensim/pull/1434))
11+
* Add word2vec-based coherence. Fix #1380 (@macks22, [#1530](https://github.com/RaRe-Technologies/gensim/pull/1530))
12+
13+
14+
:+1: Improvements:
15+
* Add 'diagonal' parameter for LdaModel.diff (@parulsethi, [#1448](https://github.com/RaRe-Technologies/gensim/pull/1448))
16+
* Add 'score' function for SklLdaModel (@chinmayapancholi13, [#1445](https://github.com/RaRe-Technologies/gensim/pull/1445))
17+
* Update sklearn API for gensim models (@chinmayapancholi13, [#1473](https://github.com/RaRe-Technologies/gensim/pull/1473)) [:warning: breaks backward compatibility]
18+
* Add CoherenceModel to LdaModel.top_topics. Fix #1128 (@macks22, [#1427](https://github.com/RaRe-Technologies/gensim/pull/1427))
19+
* Add dendrogram viz for topics and JS metric (@parulsethi, [#1484](https://github.com/RaRe-Technologies/gensim/pull/1484))
20+
* Add topic network viz (@parulsethi, [#1536](https://github.com/RaRe-Technologies/gensim/pull/1536))
21+
* Replace viewitems to iteritems. Fix #1495 (@HodorTheCoder, [#1508](https://github.com/RaRe-Technologies/gensim/pull/1508))
22+
* Fix Travis config and add style-checking for Ipython Notebooks. Fix #1518, #1520 (@menshikh-iv, [#1522](https://github.com/RaRe-Technologies/gensim/pull/1522))
23+
* Remove mutable args from definitions. Fix #1561 (@zsef123, [#1562](https://github.com/RaRe-Technologies/gensim/pull/1562))
24+
* Add Appveyour for all PRs. Fix #1565 (@menshikh-iv, [#1565](https://github.com/RaRe-Technologies/gensim/pull/1565))
25+
* Refactor code by PEP8. Partially fix #1521 (@zsef123, [#1550](https://github.com/RaRe-Technologies/gensim/pull/1550))
26+
* Refactor code by PEP8 with additional limitations. Fix #1521 (@menshikh-iv, [#1569](https://github.com/RaRe-Technologies/gensim/pull/1569))
27+
* Update FastTextKeyedVectors.\_\_contains\_\_ (@ELind77, [#1499](https://github.com/RaRe-Technologies/gensim/pull/1499))
28+
* Update WikiCorpus tokenization. Fix #1534 (@roopalgarg, [#1537](https://github.com/RaRe-Technologies/gensim/pull/1537))
29+
30+
31+
:red_circle: Bug fixes:
32+
* Remove round in LdaSeqModel.print_topic. Fix #1480 (@menshikh-iv, [#1547](https://github.com/RaRe-Technologies/gensim/pull/1547))
33+
* Fix TextCorpus.samle_text (@menshikh-iv, [#1548](https://github.com/RaRe-Technologies/gensim/pull/1548))
34+
* Fix Mallet wrapper and tests for HDPTransform (@menshikh-iv, [#1555](https://github.com/RaRe-Technologies/gensim/pull/1555))
35+
* Fix incorrect initialization ShardedCorpus with a generator. Fix #1511 (@karkkainenk1, [#1512](https://github.com/RaRe-Technologies/gensim/pull/1512))
36+
* Add verification when summarize_corpus returns null. Fix #1531 (@fbarrios, [#1570](https://github.com/RaRe-Technologies/gensim/pull/1570))
37+
* Fix doctag unicode problem. Fix 1543 (@englhardt, [#1544](https://github.com/RaRe-Technologies/gensim/pull/1544))
38+
* Fix Translation Matrix (@robotcator, [#1594](https://github.com/RaRe-Technologies/gensim/pull/1594))
39+
* Add trainable flag to KeyedVectors.get_embedding_layer. Fix #1557 (@zsef123, [#1558](https://github.com/RaRe-Technologies/gensim/pull/1558))
40+
41+
42+
:books: Tutorial and doc improvements:
43+
* Update exception text in TextCorpus.samle_text. Partial fix #308 (@vlejd, [#1444](https://github.com/RaRe-Technologies/gensim/pull/1444))
44+
* Remove extra filter_token from tutorial (@VorontsovIE, [#1502](https://github.com/RaRe-Technologies/gensim/pull/1502))
45+
* Update Doc2Vec-IMDB notebook (@pahdo, [#1476](https://github.com/RaRe-Technologies/gensim/pull/1476))
46+
* Add Google Tag Manager for site (@yardos, [#1556](https://github.com/RaRe-Technologies/gensim/pull/1556))
47+
* Update docstring explaining lack of multistream support in WikiCopus. Fix #1496 (@polm and @menshikh-iv, [#1515](https://github.com/RaRe-Technologies/gensim/pull/1515))
48+
* Fix PathLineSentences docstring (@gojomo)
49+
* Fix typos from Translation Matrix notebook (@robotcator, [#1598](https://github.com/RaRe-Technologies/gensim/pull/1598))
50+
51+
352
## 2.3.0, 2017-07-25
453

554

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,8 @@ Adopters
137137
| Amazon | <img src="http://g-ec2.images-amazon.com/images/G/01/social/api-share/amazon_logo_500500._V323939215_.png" width="100"> | [amazon.com](http://www.amazon.com/) | Document similarity|
138138
| SiteGround Hosting | <img src="https://www.siteground.com/img/knox/logos/siteground.png" width="100"> | [siteground.com](https://www.siteground.com/) | An ensemble search engine which uses different embeddings models and similarities, including word2vec, WMD, and LDA. |
139139
| Juju | <img src="https://d5k1a84rm5hwo.cloudfront.net/img/juju_home_logo.png" width="100"> | [www.juju.com](http://www.juju.com/) | Provide non-obvious related job suggestions. |
140+
| NLPub | <img src="https://nlpub.org/images/thumb/a/aa/NLPub.svg/240px-NLPub.svg.png" width="100"> | [nlpub.org](https://nlpub.org/) | Distributional semantic models including word2vec. |
141+
|Capital One | <img src="https://s3.amazonaws.com/fjds/member/original/1245173/C1_Core_NG_RGB_R_%281%29.PNG?1456169388" width="200"> | [www.capitalone.com](https://www.capitalone.com/) | Topic modeling for customer complaints exploration. |
140142

141143
-------
142144

appveyor.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,6 @@ install:
5050
- "python -c \"import struct; print(struct.calcsize('P') * 8)\""
5151

5252
# Install the build and runtime dependencies of the project.
53-
# Install the build and runtime dependencies of the project.
5453
- "%CMD_IN_ENV% pip install --timeout=60 --trusted-host 28daf2247a33ed269873-7b1aad3fab3cc330e1fd9d109892382a.r6.cf2.rackcdn.com -r continuous_integration/appveyor/requirements.txt"
5554
- "%CMD_IN_ENV% python setup.py bdist_wheel bdist_wininst"
5655
- ps: "ls dist"

continuous_integration/travis/flake8_diff.sh

Lines changed: 38 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -19,18 +19,18 @@ set -e
1919
set -o pipefail
2020

2121
PROJECT=RaRe-Technologies/gensim
22-
PROJECT_URL=https://github.com/$PROJECT.git
22+
PROJECT_URL=https://github.com/${PROJECT}.git
2323

2424
# Find the remote with the project name (upstream in most cases)
25-
REMOTE=$(git remote -v | grep $PROJECT | cut -f1 | head -1 || echo '')
25+
REMOTE=$(git remote -v | grep ${PROJECT} | cut -f1 | head -1 || echo '')
2626

2727
# Add a temporary remote if needed. For example this is necessary when
2828
# Travis is configured to run in a fork. In this case 'origin' is the
2929
# fork and not the reference repo we want to diff against.
3030
if [[ -z "$REMOTE" ]]; then
3131
TMP_REMOTE=tmp_reference_upstream
32-
REMOTE=$TMP_REMOTE
33-
git remote add $REMOTE $PROJECT_URL
32+
REMOTE=${TMP_REMOTE}
33+
git remote add ${REMOTE} ${PROJECT_URL}
3434
fi
3535

3636
echo "Remotes:"
@@ -56,15 +56,15 @@ if [[ "$TRAVIS" == "true" ]]; then
5656
echo "New branch, no commit range from Travis so passing this test by convention"
5757
exit 0
5858
fi
59-
COMMIT_RANGE=$TRAVIS_COMMIT_RANGE
59+
COMMIT_RANGE=${TRAVIS_COMMIT_RANGE}
6060
fi
6161
else
6262
# We want to fetch the code as it is in the PR branch and not
6363
# the result of the merge into develop. This way line numbers
6464
# reported by Travis will match with the local code.
65-
LOCAL_BRANCH_REF=travis_pr_$TRAVIS_PULL_REQUEST
65+
LOCAL_BRANCH_REF=travis_pr_${TRAVIS_PULL_REQUEST}
6666
# In Travis the PR target is always origin
67-
git fetch origin pull/$TRAVIS_PULL_REQUEST/head:refs/$LOCAL_BRANCH_REF
67+
git fetch origin pull/${TRAVIS_PULL_REQUEST}/head:refs/${LOCAL_BRANCH_REF}
6868
fi
6969
fi
7070

@@ -76,49 +76,55 @@ if [[ -z "$COMMIT_RANGE" ]]; then
7676
fi
7777
echo -e "\nLast 2 commits in $LOCAL_BRANCH_REF:"
7878
echo '--------------------------------------------------------------------------------'
79-
git log -2 $LOCAL_BRANCH_REF
79+
git log -2 ${LOCAL_BRANCH_REF}
8080

8181
REMOTE_MASTER_REF="$REMOTE/develop"
8282
# Make sure that $REMOTE_MASTER_REF is a valid reference
8383
echo -e "\nFetching $REMOTE_MASTER_REF"
8484
echo '--------------------------------------------------------------------------------'
85-
git fetch $REMOTE develop:refs/remotes/$REMOTE_MASTER_REF
86-
LOCAL_BRANCH_SHORT_HASH=$(git rev-parse --short $LOCAL_BRANCH_REF)
87-
REMOTE_MASTER_SHORT_HASH=$(git rev-parse --short $REMOTE_MASTER_REF)
85+
git fetch ${REMOTE} develop:refs/remotes/${REMOTE_MASTER_REF}
86+
LOCAL_BRANCH_SHORT_HASH=$(git rev-parse --short ${LOCAL_BRANCH_REF})
87+
REMOTE_MASTER_SHORT_HASH=$(git rev-parse --short ${REMOTE_MASTER_REF})
8888

89-
COMMIT=$(git merge-base $LOCAL_BRANCH_REF $REMOTE_MASTER_REF) || \
90-
echo "No common ancestor found for $(git show $LOCAL_BRANCH_REF -q) and $(git show $REMOTE_MASTER_REF -q)"
89+
COMMIT=$(git merge-base ${LOCAL_BRANCH_REF} ${REMOTE_MASTER_REF}) || \
90+
echo "No common ancestor found for $(git show ${LOCAL_BRANCH_REF} -q) and $(git show ${REMOTE_MASTER_REF} -q)"
9191

9292
if [ -z "$COMMIT" ]; then
9393
exit 1
9494
fi
9595

96-
COMMIT_SHORT_HASH=$(git rev-parse --short $COMMIT)
96+
COMMIT_SHORT_HASH=$(git rev-parse --short ${COMMIT})
9797

9898
echo -e "\nCommon ancestor between $LOCAL_BRANCH_REF ($LOCAL_BRANCH_SHORT_HASH)"\
9999
"and $REMOTE_MASTER_REF ($REMOTE_MASTER_SHORT_HASH) is $COMMIT_SHORT_HASH:"
100100
echo '--------------------------------------------------------------------------------'
101-
git show --no-patch $COMMIT_SHORT_HASH
101+
git show --no-patch ${COMMIT_SHORT_HASH}
102102

103103
COMMIT_RANGE="$COMMIT_SHORT_HASH..$LOCAL_BRANCH_SHORT_HASH"
104104

105105
if [[ -n "$TMP_REMOTE" ]]; then
106-
git remote remove $TMP_REMOTE
106+
git remote remove ${TMP_REMOTE}
107107
fi
108108

109109
else
110110
echo "Got the commit range from Travis: $COMMIT_RANGE"
111111
fi
112112

113113
echo -e '\nRunning flake8 on the diff in the range' "$COMMIT_RANGE" \
114-
"($(git rev-list $COMMIT_RANGE | wc -l) commit(s)):"
114+
"($(git rev-list ${COMMIT_RANGE} | wc -l) commit(s)):"
115115
echo '--------------------------------------------------------------------------------'
116116

117117
# We ignore files from sklearn/externals.
118118
# Excluding vec files since they contain non-utf8 content and flake8 raises exception for non-utf8 input
119119
# We need the following command to exit with 0 hence the echo in case
120120
# there is no match
121-
MODIFIED_FILES="$(git diff --name-only $COMMIT_RANGE -- . ':(exclude)*.vec' || echo "no_match")"
121+
MODIFIED_PY_FILES="$(git diff --name-only ${COMMIT_RANGE} | grep '[a-zA-Z0-9]*.py$' || echo "no_match")"
122+
MODIFIED_IPYNB_FILES="$(git diff --name-only ${COMMIT_RANGE} | grep '[a-zA-Z0-9]*.ipynb$' || echo "no_match")"
123+
124+
125+
echo "*.py files: " ${MODIFIED_PY_FILES}
126+
echo "*.ipynb files: " ${MODIFIED_IPYNB_FILES}
127+
122128

123129
check_files() {
124130
files="$1"
@@ -127,13 +133,23 @@ check_files() {
127133
if [ -n "$files" ]; then
128134
# Conservative approach: diff without context (--unified=0) so that code
129135
# that was not changed does not create failures
130-
git diff --unified=0 $COMMIT_RANGE -- $files | flake8 --diff --show-source $options
136+
git diff --unified=0 ${COMMIT_RANGE} -- ${files} | flake8 --diff --show-source ${options}
131137
fi
132138
}
133139

134-
if [[ "$MODIFIED_FILES" == "no_match" ]]; then
135-
echo "No file has been modified"
140+
if [[ "$MODIFIED_PY_FILES" == "no_match" ]]; then
141+
echo "No .py files has been modified"
136142
else
137-
check_files "$(echo "$MODIFIED_FILES" )" "--ignore=E501,E731,E12,W503 --exclude=*.sh,*.md,*.yml,*.rst,*.ipynb,*.txt,*.csv,*.vec,Dockerfile*,*.c,*.pyx,*.inc"
143+
check_files "$(echo "$MODIFIED_PY_FILES" )" "--ignore=E501,E731,E12,W503"
138144
fi
139145
echo -e "No problem detected by flake8\n"
146+
147+
if [[ "$MODIFIED_IPYNB_FILES" == "no_match" ]]; then
148+
echo "No .ipynb file has been modified"
149+
else
150+
for fname in ${MODIFIED_IPYNB_FILES}
151+
do
152+
echo "File: $fname"
153+
jupyter nbconvert --to script --stdout ${fname} | flake8 - --show-source --ignore=E501,E731,E12,W503,E402 --builtins=get_ipython || true
154+
done
155+
fi
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/bash
2+
3+
set -e
4+
5+
deactivate
6+
wget 'http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh' -O miniconda.sh
7+
chmod +x miniconda.sh && ./miniconda.sh -b
8+
export PATH=/home/travis/miniconda2/bin:$PATH
9+
conda update --yes conda
10+
11+
12+
conda create --yes -n gensim-test python=${PYTHON_VERSION} pip atlas flake8 jupyter numpy==${NUMPY_VERSION} scipy==${SCIPY_VERSION} && source activate gensim-test
13+
pip install . && pip install .[test]
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/bin/bash
2+
3+
set -e
4+
5+
pip freeze
6+
7+
if [[ "$ONLY_CODESTYLE" == "yes" ]]; then
8+
continuous_integration/travis/flake8_diff.sh
9+
else
10+
python setup.py test
11+
fi

docker/start_jupyter_notebook.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ PORT=$1
44
NOTEBOOK_DIR=/gensim/docs/notebooks
55
DEFAULT_URL=/notebooks/gensim%20Quick%20Start.ipynb
66

7-
jupyter notebook --no-browser --ip=* --port=$PORT --allow-root --notebook-dir=$NOTEBOOK_DIR --NotebookApp.token=\"\" --NotebookApp.default_url=$DEFAULT_URL
7+
jupyter notebook --no-browser --ip=* --port=${PORT} --allow-root --notebook-dir=${NOTEBOOK_DIR} --NotebookApp.token=\"\" --NotebookApp.default_url=${DEFAULT_URL}

docs/notebooks/Coherence.gif

71.2 KB
Loading

docs/notebooks/Convergence.gif

52.5 KB
Loading

0 commit comments

Comments
 (0)