Skip to content

Commit 78ec77f

Browse files
authored
Release 0.81 version (#3864)
* Release 0.81 version * Update NEWS.md
1 parent c22e90d commit 78ec77f

File tree

7 files changed

+173
-16
lines changed

7 files changed

+173
-16
lines changed

NEWS.md

Lines changed: 160 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,166 @@ XGBoost Change Log
33

44
This file records the changes in xgboost library in reverse chronological order.
55

6-
## Master (2018.09.30)
7-
* BREAKING CHANGES
8-
- External memory page files have changed, breaking backwards compatibility for temporary storage used during external memory training. This only affects external memory users upgrading their xgboost version - we recommend clearing all *.page files before resuming training. Model serialization is unaffected.
6+
## v0.81 (2018.11.04)
7+
### New feature: feature interaction constraints
8+
* Users are now able to control which features (independent variables) are allowed to interact by specifying feature interaction constraints (#3466).
9+
* [Tutorial](https://xgboost.readthedocs.io/en/release_0.81/tutorials/feature_interaction_constraint.html) is available, as well as [R](https://github.com/dmlc/xgboost/blob/9254c58e4dfff6a59dc0829a2ceb02e45ed17cd0/R-package/demo/interaction_constraints.R) and [Python](https://github.com/dmlc/xgboost/blob/9254c58e4dfff6a59dc0829a2ceb02e45ed17cd0/tests/python/test_interaction_constraints.py) examples.
10+
11+
### New feature: learning to rank using scikit-learn interface
12+
* Learning to rank task is now available for the scikit-learn interface of the Python package (#3560, #3848). It is now possible to integrate the XGBoost ranking model into the scikit-learn learning pipeline.
13+
* Examples of using `XGBRanker` class is found at [demo/rank/rank_sklearn.py](https://github.com/dmlc/xgboost/blob/24a268a2e3cb17302db3d72da8f04016b7d352d9/demo/rank/rank_sklearn.py).
14+
15+
### New feature: R interface for SHAP interactions
16+
* SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. Previously, this feature was only available from the Python package; now it is available from the R package as well (#3636).
17+
18+
### New feature: GPU predictor now use multiple GPUs to predict
19+
* GPU predictor is now able to utilize multiple GPUs at once to accelerate prediction (#3738)
20+
21+
### New feature: Scale distributed XGBoost to large-scale clusters
22+
* Fix OS file descriptor limit assertion error on large cluster (#3835, dmlc/rabit#73) by replacing `select()` based AllReduce/Broadcast with `poll()` based implementation.
23+
* Mitigate tracker "thundering herd" issue on large cluster. Add exponential backoff retry when workers connect to tracker.
24+
* With this change, we were able to scale to 1.5k executors on a 12 billion row dataset after some tweaks here and there.
25+
26+
### Major bug fix: learning to rank with XGBoost4J-Spark
27+
* Previously, `repartitionForData` would shuffle data and lose ordering necessary for ranking task.
28+
* To fix this issue, data points within each RDD partition is explicitly group by their group (query session) IDs (#3654). Also handle empty RDD partition carefully (#3750).
29+
30+
### Major bug fix: early stopping fixed in XGBoost4J-Spark
31+
* Earlier implementation of early stopping had incorrect semantics and didn't let users to specify direction for optimizing (maximize / minimize)
32+
* A parameter `maximize_evaluation_metrics` is defined so as to tell whether a metric should be maximized or minimized as part of early stopping criteria (#3808). Also early stopping now has correct semantics.
33+
34+
### API changes
35+
* Column sampling by level (`colsample_bylevel`) is now functional for `hist` algorithm (#3635, #3862)
36+
* Add `disable_default_eval_metric` parameter to disable default metric (#3606)
37+
* Experimental AVX support for gradient computation is removed (#3752)
38+
* XGBoost4J-Spark
39+
- Add `rank:ndcg` and `rank:map` to supported objectives (#3697)
40+
* Python package
41+
- Add `callbacks` argument to `fit()` function of sciki-learn API (#3682)
42+
- Add `XGBRanker` to scikit-learn interface (#3560, #3848)
43+
- Add `validate_features` argument to `predict()` function of scikit-learn API (#3653)
44+
- Allow scikit-learn grid search over parameters specified as keyword arguments (#3791)
45+
- Add `coef_` and `intercept_` as properties of scikit-learn wrapper (#3855). Some scikit-learn functions expect these properties.
46+
47+
### Performance improvements
48+
* Address very high GPU memory usage for large data (#3635)
49+
* Fix performance regression within `EvaluateSplits()` of `gpu_hist` algorithm. (#3680)
50+
51+
### Bug-fixes
52+
* Fix a problem in GPU quantile sketch with tiny instance weights. (#3628)
53+
* Fix copy constructor for `HostDeviceVectorImpl` to prevent dangling pointers (#3657)
54+
* Fix a bug in partitioned file loading (#3673)
55+
* Fixed an uninitialized pointer in `gpu_hist` (#3703)
56+
* Reshared data among GPUs when number of GPUs is changed (#3721)
57+
* Add back `max_delta_step` to split evaluation (#3668)
58+
* Do not round up integer thresholds for integer features in JSON dump (#3717)
59+
* Use `dmlc::TemporaryDirectory` to handle temporaries in cross-platform way (#3783)
60+
* Fix accuracy problem with `gpu_hist` when `min_child_weight` and `lambda` are set to 0 (#3793)
61+
* Make sure that `tree_method` parameter is recognized and not silently ignored (#3849)
62+
* XGBoost4J-Spark
63+
- Make sure `thresholds` are considered when executing `predict()` method (#3577)
64+
- Avoid losing precision when computing probabilities by converting to `Double` early (#3576)
65+
- `getTreeLimit()` should return `Int` (#3602)
66+
- Fix checkpoint serialization on HDFS (#3614)
67+
- Throw `ControlThrowable` instead of `InterruptedException` so that it is properly re-thrown (#3632)
68+
- Remove extraneous output to stdout (#3665)
69+
- Allow specification of task type for custom objectives and evaluations (#3646)
70+
- Fix distributed updater check (#3739)
71+
- Fix issue when spark job execution thread cannot return before we execute `first()` (#3758)
72+
* Python package
73+
- Fix accessing `DMatrix.handle` before it is set (#3599)
74+
- `XGBClassifier.predict()` should return margin scores when `output_margin` is set to true (#3651)
75+
- Early stopping callback should maximize metric of form `NDCG@n-` (#3685)
76+
- Preserve feature names when slicing `DMatrix` (#3766)
77+
* R package
78+
- Replace `nround` with `nrounds` to match actual parameter (#3592)
79+
- Amend `xgb.createFolds` to handle classes of a single element (#3630)
80+
- Fix buggy random generator and make `colsample_bytree` functional (#3781)
81+
82+
### Maintenance: testing, continuous integration, build system
83+
* Add sanitizers tests to Travis CI (#3557)
84+
* Add NumPy, Matplotlib, Graphviz as requirements for doc build (#3669)
85+
* Comply with CRAN submission policy (#3660, #3728)
86+
* Remove copy-paste error in JVM test suite (#3692)
87+
* Disable flaky tests in `R-package/tests/testthat/test_update.R` (#3723)
88+
* Make Python tests compatible with scikit-learn 0.20 release (#3731)
89+
* Separate out restricted and unrestricted tasks, so that pull requests don't build downloadable artifacts (#3736)
90+
* Add multi-GPU unit test environment (#3741)
91+
* Allow plug-ins to be built by CMake (#3752)
92+
* Test wheel compatibility on CPU containers for pull requests (#3762)
93+
* Fix broken doc build due to Matplotlib 3.0 release (#3764)
94+
* Produce `xgboost.so` for XGBoost-R on Mac OSX, so that `make install` works (#3767)
95+
* Retry Jenkins CI tests up to 3 times to improve reliability (#3769, #3769, #3775, #3776, #3777)
96+
* Add basic unit tests for `gpu_hist` algorithm (#3785)
97+
* Fix Python environment for distributed unit tests (#3806)
98+
* Test wheels on CUDA 10.0 container for compatibility (#3838)
99+
* Fix JVM doc build (#3853)
100+
101+
### Maintenance: Refactor C++ code for legibility and maintainability
102+
* Merge generic device helper functions into `GPUSet` class (#3626)
103+
* Re-factor column sampling logic into `ColumnSampler` class (#3635, #3637)
104+
* Replace `std::vector` with `HostDeviceVector` in `MetaInfo` and `SparsePage` (#3446)
105+
* Simplify `DMatrix` class (#3395)
106+
* De-duplicate CPU/GPU code using `Transform` class (#3643, #3751)
107+
* Remove obsoleted `QuantileHistMaker` class (#3761)
108+
* Remove obsoleted `NoConstraint` class (#3792)
109+
110+
### Other Features
111+
* C++20-compliant Span class for safe pointer indexing (#3548, #3588)
112+
* Add helper functions to manipulate multiple GPU devices (#3693)
113+
* XGBoost4J-Spark
114+
- Allow specifying host ip from the `xgboost-tracker.properties file` (#3833). This comes in handy when `hosts` files doesn't correctly define localhost.
115+
116+
### Usability Improvements
117+
* Add reference to GitHub repository in `pom.xml` of JVM packages (#3589)
118+
* Add R demo of multi-class classification (#3695)
119+
* Document JSON dump functionality (#3600, #3603)
120+
* Document CUDA requirement and lack of external memory for GPU algorithms (#3624)
121+
* Document LambdaMART objectives, both pairwise and listwise (#3672)
122+
* Document `aucpr` evaluation metric (#3687)
123+
* Document gblinear parameters: `feature_selector` and `top_k` (#3780)
124+
* Add instructions for using MinGW-built XGBoost with Python. (#3774)
125+
* Removed nonexistent parameter `use_buffer` from documentation (#3610)
126+
* Update Python API doc to include all classes and members (#3619, #3682)
127+
* Fix typos and broken links in documentation (#3618, #3640, #3676, #3713, #3759, #3784, #3843, #3852)
128+
* Binary classification demo should produce LIBSVM with 0-based indexing (#3652)
129+
* Process data once for Python and CLI examples of learning to rank (#3666)
130+
* Include full text of Apache 2.0 license in the repository (#3698)
131+
* Save predictor parameters in model file (#3856)
132+
* JVM packages
133+
- Let users specify feature names when calling `getModelDump` and `getFeatureScore` (#3733)
134+
- Warn the user about the lack of over-the-wire encryption (#3667)
135+
- Fix errors in examples (#3719)
136+
- Document choice of trackers (#3831)
137+
- Document that vanilla Apache Spark is required (#3854)
138+
* Python package
139+
- Document that custom objective can't contain colon (:) (#3601)
140+
- Show a better error message for failed library loading (#3690)
141+
- Document that feature importance is unavailable for non-tree learners (#3765)
142+
- Document behavior of `get_fscore()` for zero-importance features (#3763)
143+
- Recommend pickling as the way to save `XGBClassifier` / `XGBRegressor` / `XGBRanker` (#3829)
144+
* R package
145+
- Enlarge variable importance plot to make it more visible (#3820)
146+
147+
### BREAKING CHANGES
148+
* External memory page files have changed, breaking backwards compatibility for temporary storage used during external memory training. This only affects external memory users upgrading their xgboost version - we recommend clearing all `*.page` files before resuming training. Model serialization is unaffected.
149+
150+
### Known issues
151+
* Quantile sketcher fails to produce any quantile for some edge cases (#2943)
152+
* The `hist` algorithm leaks memory when used with learning rate decay callback (#3579)
153+
* Using custom evaluation funciton together with early stopping causes assertion failure in XGBoost4J-Spark (#3595)
154+
* Early stopping doesn't work with `gblinear` learner (#3789)
155+
* Label and weight vectors are not reshared upon the change in number of GPUs (#3794). To get around this issue, delete the `DMatrix` object and re-load.
156+
* The `DMatrix` Python objects are initialized with incorrect values when given array slices (#3841)
157+
* The `gpu_id` parameter is broken and not yet properly supported (#3850)
158+
159+
### Acknowledgement
160+
**Contributors** (in no particular order): Hyunsu Cho (@hcho3), Jiaming Yuan (@trivialfis), Nan Zhu (@CodingCat), Rory Mitchell (@RAMitchell), Andy Adinets (@canonizer), Vadim Khotilovich (@khotilov), Sergei Lebedev (@superbobry)
161+
162+
**First-time Contributors** (in no particular order): Matthew Tovbin (@tovbinm), Jakob Richter (@jakob-r), Grace Lam (@grace-lam), Grant W Schneider (@grantschneider), Andrew Thia (@BlueTea88), Sergei Chipiga (@schipiga), Joseph Bradley (@jkbradley), Chen Qin (@chenqin), Jerry Lin (@linjer), Dmitriy Rybalko (@rdtft), Michael Mui (@mmui), Takahiro Kojima (@515hikaru), Bruce Zhao (@BruceZhaoR), Wei Tian (@weitian), Saumya Bhatnagar (@Sam1301), Juzer Shakir (@JuzerShakir), Zhao Hang (@cleghom), Jonathan Friedman (@jontonsoup), Bruno Tremblay (@meztez), @Shiki-H, @mrgutkun, @gorogm, @htgeis, @jakehoare, @zengxy, @KOLANICH
163+
164+
**First-time Reviewers** (in no particular order): Nikita Titov (@StrikerRUS), Xiangrui Meng (@mengxr), Nirmal Borah (@Nirmal-Neel)
165+
9166

10167
## v0.80 (2018.08.13)
11168
* **JVM packages received a major upgrade**: To consolidate the APIs and improve the user experience, we refactored the design of XGBoost4J-Spark in a significant manner. (#3387)

jvm-packages/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<groupId>ml.dmlc</groupId>
88
<artifactId>xgboost-jvm</artifactId>
9-
<version>0.81-SNAPSHOT</version>
9+
<version>0.81</version>
1010
<packaging>pom</packaging>
1111
<name>XGBoost JVM Package</name>
1212
<description>JVM Package for XGBoost</description>

jvm-packages/xgboost4j-example/pom.xml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@
66
<parent>
77
<groupId>ml.dmlc</groupId>
88
<artifactId>xgboost-jvm</artifactId>
9-
<version>0.81-SNAPSHOT</version>
9+
<version>0.81</version>
1010
</parent>
1111
<artifactId>xgboost4j-example</artifactId>
12-
<version>0.81-SNAPSHOT</version>
12+
<version>0.81</version>
1313
<packaging>jar</packaging>
1414
<build>
1515
<plugins>
@@ -26,7 +26,7 @@
2626
<dependency>
2727
<groupId>ml.dmlc</groupId>
2828
<artifactId>xgboost4j-spark</artifactId>
29-
<version>0.81-SNAPSHOT</version>
29+
<version>0.81</version>
3030
</dependency>
3131
<dependency>
3232
<groupId>org.apache.spark</groupId>
@@ -37,7 +37,7 @@
3737
<dependency>
3838
<groupId>ml.dmlc</groupId>
3939
<artifactId>xgboost4j-flink</artifactId>
40-
<version>0.81-SNAPSHOT</version>
40+
<version>0.81</version>
4141
</dependency>
4242
<dependency>
4343
<groupId>org.apache.commons</groupId>

jvm-packages/xgboost4j-flink/pom.xml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@
66
<parent>
77
<groupId>ml.dmlc</groupId>
88
<artifactId>xgboost-jvm</artifactId>
9-
<version>0.81-SNAPSHOT</version>
9+
<version>0.81</version>
1010
</parent>
1111
<artifactId>xgboost4j-flink</artifactId>
12-
<version>0.81-SNAPSHOT</version>
12+
<version>0.81</version>
1313
<build>
1414
<plugins>
1515
<plugin>
@@ -26,7 +26,7 @@
2626
<dependency>
2727
<groupId>ml.dmlc</groupId>
2828
<artifactId>xgboost4j</artifactId>
29-
<version>0.81-SNAPSHOT</version>
29+
<version>0.81</version>
3030
</dependency>
3131
<dependency>
3232
<groupId>org.apache.commons</groupId>

jvm-packages/xgboost4j-spark/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>ml.dmlc</groupId>
88
<artifactId>xgboost-jvm</artifactId>
9-
<version>0.81-SNAPSHOT</version>
9+
<version>0.81</version>
1010
</parent>
1111
<artifactId>xgboost4j-spark</artifactId>
1212
<build>
@@ -24,7 +24,7 @@
2424
<dependency>
2525
<groupId>ml.dmlc</groupId>
2626
<artifactId>xgboost4j</artifactId>
27-
<version>0.81-SNAPSHOT</version>
27+
<version>0.81</version>
2828
</dependency>
2929
<dependency>
3030
<groupId>org.apache.spark</groupId>

jvm-packages/xgboost4j/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@
66
<parent>
77
<groupId>ml.dmlc</groupId>
88
<artifactId>xgboost-jvm</artifactId>
9-
<version>0.81-SNAPSHOT</version>
9+
<version>0.81</version>
1010
</parent>
1111
<artifactId>xgboost4j</artifactId>
12-
<version>0.81-SNAPSHOT</version>
12+
<version>0.81</version>
1313
<packaging>jar</packaging>
1414

1515
<dependencies>

python-package/xgboost/VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.80
1+
0.81

0 commit comments

Comments
 (0)