Skip to content

Commit 1b8fdaf

Browse files
committed
Merge branch 'release-2.1.0'
2 parents 5031f8b + 0d4952d commit 1b8fdaf

25 files changed

+1596
-396
lines changed

.travis.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,7 @@ install:
2121
- pip install scikit-learn
2222
- pip install Morfessor==2.0.2a4
2323
- python setup.py install
24-
script: python setup.py test
24+
script:
25+
- python setup.py test
26+
- pip install flake8
27+
- continuous_integration/travis/flake8_diff.sh

CHANGELOG.md

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,33 @@ Unreleased:
66

77
===========
88

9+
2.1.0, 2017-05-12
10+
11+
:star2: New features:
12+
* Add modified save_word2vec_format for Doc2Vec, to save document vectors. (@parulsethi, [#1256](https://github.com/RaRe-Technologies/gensim/pull/1256))
13+
14+
15+
:+1: Improvements:
16+
* Add automatic code style check limited only to the code modified in PR (@tmylk, [#1287](https://github.com/RaRe-Technologies/gensim/pull/1287))
17+
* Replace `logger.warn` by `logger.warning` (@chinmayapancholi13, [#1295](https://github.com/RaRe-Technologies/gensim/pull/1295))
18+
* Docs word2vec docstring improvement, deprecation labels (@shubhvachher, [#1274](https://github.com/RaRe-Technologies/gensim/pull/1274))
19+
* Stop passing 'sentences' as parameter to Doc2Vec. Fix #511 (@gogokaradjov, [#1306](https://github.com/RaRe-Technologies/gensim/pull/1306))
20+
21+
22+
:red_circle: Bug fixes:
23+
* Allow indexing with np.int64 in doc2vec. Fix #1231 (@bogdanteleaga, [#1254](https://github.com/RaRe-Technologies/gensim/pull/1254))
24+
* Update Doc2Vec docstring. Fix #1302 (@datapythonista, [#1307](https://github.com/RaRe-Technologies/gensim/pull/1307))
25+
* Ignore rst and ipynb file in Travis flake8 validations (@datapythonista, [#1309](https://github.com/RaRe-Technologies/gensim/pull/1309))
26+
27+
28+
:books: Tutorial and doc improvements:
29+
* Update Tensorboard Doc2Vec notebook (@parulsethi, [#1286](https://github.com/RaRe-Technologies/gensim/pull/1286))
30+
* Update Doc2Vec IMDB Notebook, replace codesc to smart_open (@robotcator, [#1278](https://github.com/RaRe-Technologies/gensim/pull/1278))
31+
* Add explanation of `size` to Word2Vec Notebook (@jbcoe, [#1305](https://github.com/RaRe-Technologies/gensim/pull/1305))
32+
* Add extra param to WordRank notebook. Fix #1276 (@parulsethi, [#1300](https://github.com/RaRe-Technologies/gensim/pull/1300))
33+
* Update warning message in WordRank (@parulsethi, [#1299](https://github.com/RaRe-Technologies/gensim/pull/1299))
34+
35+
936
2.0.0, 2017-04-10
1037

1138
Breaking changes:
@@ -17,7 +44,6 @@ See the [method documentation](https://github.com/RaRe-Technologies/gensim/blob/
1744
* Explicit epochs and corpus size in word2vec train(). (@gojomo, @robotcator, [#1139](https://github.com/RaRe-Technologies/gensim/pull/1139), [#1237](https://github.com/RaRe-Technologies/gensim/pull/1237))
1845

1946
New features:
20-
2147
* Add output word prediction in word2vec. Only for negative sampling scheme. See [ipynb]( https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/word2vec.ipynb) (@chinmayapancholi13,[#1209](https://github.com/RaRe-Technologies/gensim/pull/1209))
2248
* scikit_learn wrapper for LSI Model in Gensim (@chinmayapancholi13,[#1244](https://github.com/RaRe-Technologies/gensim/pull/1244))
2349
* Add the 'keep_tokens' parameter to 'filter_extremes'. (@toliwa,[#1210](https://github.com/RaRe-Technologies/gensim/pull/1210))

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@ Adopters
121121
| Name | Logo | URL | Description |
122122
|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
123123
| RaRe Technologies | <img src="http://rare-technologies.com/wp-content/uploads/2016/02/rare_image_only.png" width="100"> | [rare-technologies.com](http://rare-technologies.com) | Machine learning & NLP consulting and training. Creators and maintainers of Gensim. |
124+
| Mindseye | <img src="http://www.mindseyesolutions.com/wp-content/uploads/2015/12/Mindseye_logo_website.jpg" width="100"> | [mindseye.com](http://www.mindseyesolutions.com/) | Similarities in legal documents |
124125
| Talentpair | ![Talentpair](https://avatars3.githubusercontent.com/u/8418395?v=3&s=100) | [talentpair.com](http://talentpair.com) | Data science driving high-touch recruiting |
125126
| Tailwind | <img src="http://blog.tailwindapp.com/wp-content/uploads/2013/10/Tailwind-Square-Logo-Blue-White-300x300.png" width="100"> | [Tailwindapp.com](https://www.tailwindapp.com/)| Post interesting and relevant content to Pinterest |
126127
| Issuu | <img src="https://static.isu.pub/fe/issuu-brandpages/s3/155/press/assets/brand_package_zip/issuu%20logos/png/issuu-logo-stacked-colour.png" width="100"> | [Issuu.com](https://issuu.com/)| Gensim’s LDA module lies at the very core of the analysis we perform on each uploaded publication to figure out what it’s all about.
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
#!/bin/bash
2+
# This is a modified script from scikit-learn project.
3+
4+
# This script is used in Travis to check that PRs do not add obvious
5+
# flake8 violations. It relies on two things:
6+
# - find common ancestor between branch and
7+
# gensim remote
8+
# - run flake8 --diff on the diff between the branch and the common
9+
# ancestor
10+
#
11+
# Additional features:
12+
# - the line numbers in Travis match the local branch on the PR
13+
# author machine.
14+
# - ./continuous_integration/travis/flake8_diff.sh can be run locally for quick
15+
# turn-around
16+
17+
set -e
18+
# pipefail is necessary to propagate exit codes
19+
set -o pipefail
20+
21+
PROJECT=RaRe-Technologies/gensim
22+
PROJECT_URL=https://github.com/$PROJECT.git
23+
24+
# Find the remote with the project name (upstream in most cases)
25+
REMOTE=$(git remote -v | grep $PROJECT | cut -f1 | head -1 || echo '')
26+
27+
# Add a temporary remote if needed. For example this is necessary when
28+
# Travis is configured to run in a fork. In this case 'origin' is the
29+
# fork and not the reference repo we want to diff against.
30+
if [[ -z "$REMOTE" ]]; then
31+
TMP_REMOTE=tmp_reference_upstream
32+
REMOTE=$TMP_REMOTE
33+
git remote add $REMOTE $PROJECT_URL
34+
fi
35+
36+
echo "Remotes:"
37+
echo '--------------------------------------------------------------------------------'
38+
git remote --verbose
39+
40+
# Travis does the git clone with a limited depth (50 at the time of
41+
# writing). This may not be enough to find the common ancestor with
42+
# $REMOTE/develop so we unshallow the git checkout
43+
if [[ -a .git/shallow ]]; then
44+
echo -e '\nTrying to unshallow the repo:'
45+
echo '--------------------------------------------------------------------------------'
46+
git fetch --unshallow
47+
fi
48+
49+
if [[ "$TRAVIS" == "true" ]]; then
50+
if [[ "$TRAVIS_PULL_REQUEST" == "false" ]]
51+
then
52+
# In main repo, using TRAVIS_COMMIT_RANGE to test the commits
53+
# that were pushed into a branch
54+
if [[ "$PROJECT" == "$TRAVIS_REPO_SLUG" ]]; then
55+
if [[ -z "$TRAVIS_COMMIT_RANGE" ]]; then
56+
echo "New branch, no commit range from Travis so passing this test by convention"
57+
exit 0
58+
fi
59+
COMMIT_RANGE=$TRAVIS_COMMIT_RANGE
60+
fi
61+
else
62+
# We want to fetch the code as it is in the PR branch and not
63+
# the result of the merge into develop. This way line numbers
64+
# reported by Travis will match with the local code.
65+
LOCAL_BRANCH_REF=travis_pr_$TRAVIS_PULL_REQUEST
66+
# In Travis the PR target is always origin
67+
git fetch origin pull/$TRAVIS_PULL_REQUEST/head:refs/$LOCAL_BRANCH_REF
68+
fi
69+
fi
70+
71+
# If not using the commit range from Travis we need to find the common
72+
# ancestor between $LOCAL_BRANCH_REF and $REMOTE/develop
73+
if [[ -z "$COMMIT_RANGE" ]]; then
74+
if [[ -z "$LOCAL_BRANCH_REF" ]]; then
75+
LOCAL_BRANCH_REF=$(git rev-parse --abbrev-ref HEAD)
76+
fi
77+
echo -e "\nLast 2 commits in $LOCAL_BRANCH_REF:"
78+
echo '--------------------------------------------------------------------------------'
79+
git log -2 $LOCAL_BRANCH_REF
80+
81+
REMOTE_MASTER_REF="$REMOTE/develop"
82+
# Make sure that $REMOTE_MASTER_REF is a valid reference
83+
echo -e "\nFetching $REMOTE_MASTER_REF"
84+
echo '--------------------------------------------------------------------------------'
85+
git fetch $REMOTE develop:refs/remotes/$REMOTE_MASTER_REF
86+
LOCAL_BRANCH_SHORT_HASH=$(git rev-parse --short $LOCAL_BRANCH_REF)
87+
REMOTE_MASTER_SHORT_HASH=$(git rev-parse --short $REMOTE_MASTER_REF)
88+
89+
COMMIT=$(git merge-base $LOCAL_BRANCH_REF $REMOTE_MASTER_REF) || \
90+
echo "No common ancestor found for $(git show $LOCAL_BRANCH_REF -q) and $(git show $REMOTE_MASTER_REF -q)"
91+
92+
if [ -z "$COMMIT" ]; then
93+
exit 1
94+
fi
95+
96+
COMMIT_SHORT_HASH=$(git rev-parse --short $COMMIT)
97+
98+
echo -e "\nCommon ancestor between $LOCAL_BRANCH_REF ($LOCAL_BRANCH_SHORT_HASH)"\
99+
"and $REMOTE_MASTER_REF ($REMOTE_MASTER_SHORT_HASH) is $COMMIT_SHORT_HASH:"
100+
echo '--------------------------------------------------------------------------------'
101+
git show --no-patch $COMMIT_SHORT_HASH
102+
103+
COMMIT_RANGE="$COMMIT_SHORT_HASH..$LOCAL_BRANCH_SHORT_HASH"
104+
105+
if [[ -n "$TMP_REMOTE" ]]; then
106+
git remote remove $TMP_REMOTE
107+
fi
108+
109+
else
110+
echo "Got the commit range from Travis: $COMMIT_RANGE"
111+
fi
112+
113+
echo -e '\nRunning flake8 on the diff in the range' "$COMMIT_RANGE" \
114+
"($(git rev-list $COMMIT_RANGE | wc -l) commit(s)):"
115+
echo '--------------------------------------------------------------------------------'
116+
117+
# We ignore files from sklearn/externals.
118+
# We need the following command to exit with 0 hence the echo in case
119+
# there is no match
120+
MODIFIED_FILES="$(git diff --name-only $COMMIT_RANGE || echo "no_match")"
121+
122+
check_files() {
123+
files="$1"
124+
shift
125+
options="$*"
126+
if [ -n "$files" ]; then
127+
# Conservative approach: diff without context (--unified=0) so that code
128+
# that was not changed does not create failures
129+
git diff --unified=0 $COMMIT_RANGE -- $files | flake8 --diff --show-source $options
130+
fi
131+
}
132+
133+
if [[ "$MODIFIED_FILES" == "no_match" ]]; then
134+
echo "No file has been modified"
135+
else
136+
check_files "$(echo "$MODIFIED_FILES" )" "--ignore=E501,E731,E12,W503 --exclude=*.sh,*.md,*.yml,*.rst,*.ipynb"
137+
fi
138+
echo -e "No problem detected by flake8\n"

docs/notebooks/Tensorboard.png

341 KB
Loading

0 commit comments

Comments
 (0)