You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: NEWS.md
+160-3Lines changed: 160 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -3,9 +3,166 @@ XGBoost Change Log
3
3
4
4
This file records the changes in xgboost library in reverse chronological order.
5
5
6
-
## Master (2018.09.30)
7
-
* BREAKING CHANGES
8
-
- External memory page files have changed, breaking backwards compatibility for temporary storage used during external memory training. This only affects external memory users upgrading their xgboost version - we recommend clearing all *.page files before resuming training. Model serialization is unaffected.
6
+
## v0.81 (2018.11.04)
7
+
### New feature: feature interaction constraints
8
+
* Users are now able to control which features (independent variables) are allowed to interact by specifying feature interaction constraints (#3466).
9
+
*[Tutorial](https://xgboost.readthedocs.io/en/release_0.81/tutorials/feature_interaction_constraint.html) is available, as well as [R](https://github.com/dmlc/xgboost/blob/9254c58e4dfff6a59dc0829a2ceb02e45ed17cd0/R-package/demo/interaction_constraints.R) and [Python](https://github.com/dmlc/xgboost/blob/9254c58e4dfff6a59dc0829a2ceb02e45ed17cd0/tests/python/test_interaction_constraints.py) examples.
10
+
11
+
### New feature: learning to rank using scikit-learn interface
12
+
* Learning to rank task is now available for the scikit-learn interface of the Python package (#3560, #3848). It is now possible to integrate the XGBoost ranking model into the scikit-learn learning pipeline.
13
+
* Examples of using `XGBRanker` class is found at [demo/rank/rank_sklearn.py](https://github.com/dmlc/xgboost/blob/24a268a2e3cb17302db3d72da8f04016b7d352d9/demo/rank/rank_sklearn.py).
14
+
15
+
### New feature: R interface for SHAP interactions
16
+
* SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. Previously, this feature was only available from the Python package; now it is available from the R package as well (#3636).
17
+
18
+
### New feature: GPU predictor now use multiple GPUs to predict
19
+
* GPU predictor is now able to utilize multiple GPUs at once to accelerate prediction (#3738)
20
+
21
+
### New feature: Scale distributed XGBoost to large-scale clusters
22
+
* Fix OS file descriptor limit assertion error on large cluster (#3835, dmlc/rabit#73) by replacing `select()` based AllReduce/Broadcast with `poll()` based implementation.
23
+
* Mitigate tracker "thundering herd" issue on large cluster. Add exponential backoff retry when workers connect to tracker.
24
+
* With this change, we were able to scale to 1.5k executors on a 12 billion row dataset after some tweaks here and there.
25
+
26
+
### Major bug fix: learning to rank with XGBoost4J-Spark
27
+
* Previously, `repartitionForData` would shuffle data and lose ordering necessary for ranking task.
28
+
* To fix this issue, data points within each RDD partition is explicitly group by their group (query session) IDs (#3654). Also handle empty RDD partition carefully (#3750).
29
+
30
+
### Major bug fix: early stopping fixed in XGBoost4J-Spark
31
+
* Earlier implementation of early stopping had incorrect semantics and didn't let users to specify direction for optimizing (maximize / minimize)
32
+
* A parameter `maximize_evaluation_metrics` is defined so as to tell whether a metric should be maximized or minimized as part of early stopping criteria (#3808). Also early stopping now has correct semantics.
33
+
34
+
### API changes
35
+
* Column sampling by level (`colsample_bylevel`) is now functional for `hist` algorithm (#3635, #3862)
36
+
* Add `disable_default_eval_metric` parameter to disable default metric (#3606)
37
+
* Experimental AVX support for gradient computation is removed (#3752)
38
+
* XGBoost4J-Spark
39
+
- Add `rank:ndcg` and `rank:map` to supported objectives (#3697)
40
+
* Python package
41
+
- Add `callbacks` argument to `fit()` function of sciki-learn API (#3682)
42
+
- Add `XGBRanker` to scikit-learn interface (#3560, #3848)
43
+
- Add `validate_features` argument to `predict()` function of scikit-learn API (#3653)
44
+
- Allow scikit-learn grid search over parameters specified as keyword arguments (#3791)
45
+
- Add `coef_` and `intercept_` as properties of scikit-learn wrapper (#3855). Some scikit-learn functions expect these properties.
46
+
47
+
### Performance improvements
48
+
* Address very high GPU memory usage for large data (#3635)
49
+
* Fix performance regression within `EvaluateSplits()` of `gpu_hist` algorithm. (#3680)
50
+
51
+
### Bug-fixes
52
+
* Fix a problem in GPU quantile sketch with tiny instance weights. (#3628)
53
+
* Fix copy constructor for `HostDeviceVectorImpl` to prevent dangling pointers (#3657)
54
+
* Fix a bug in partitioned file loading (#3673)
55
+
* Fixed an uninitialized pointer in `gpu_hist` (#3703)
56
+
* Reshared data among GPUs when number of GPUs is changed (#3721)
57
+
* Add back `max_delta_step` to split evaluation (#3668)
58
+
* Do not round up integer thresholds for integer features in JSON dump (#3717)
59
+
* Use `dmlc::TemporaryDirectory` to handle temporaries in cross-platform way (#3783)
60
+
* Fix accuracy problem with `gpu_hist` when `min_child_weight` and `lambda` are set to 0 (#3793)
61
+
* Make sure that `tree_method` parameter is recognized and not silently ignored (#3849)
62
+
* XGBoost4J-Spark
63
+
- Make sure `thresholds` are considered when executing `predict()` method (#3577)
64
+
- Avoid losing precision when computing probabilities by converting to `Double` early (#3576)
65
+
-`getTreeLimit()` should return `Int` (#3602)
66
+
- Fix checkpoint serialization on HDFS (#3614)
67
+
- Throw `ControlThrowable` instead of `InterruptedException` so that it is properly re-thrown (#3632)
68
+
- Remove extraneous output to stdout (#3665)
69
+
- Allow specification of task type for custom objectives and evaluations (#3646)
70
+
- Fix distributed updater check (#3739)
71
+
- Fix issue when spark job execution thread cannot return before we execute `first()` (#3758)
72
+
* Python package
73
+
- Fix accessing `DMatrix.handle` before it is set (#3599)
74
+
-`XGBClassifier.predict()` should return margin scores when `output_margin` is set to true (#3651)
75
+
- Early stopping callback should maximize metric of form `NDCG@n-` (#3685)
76
+
- Preserve feature names when slicing `DMatrix` (#3766)
77
+
* R package
78
+
- Replace `nround` with `nrounds` to match actual parameter (#3592)
79
+
- Amend `xgb.createFolds` to handle classes of a single element (#3630)
80
+
- Fix buggy random generator and make `colsample_bytree` functional (#3781)
81
+
82
+
### Maintenance: testing, continuous integration, build system
83
+
* Add sanitizers tests to Travis CI (#3557)
84
+
* Add NumPy, Matplotlib, Graphviz as requirements for doc build (#3669)
85
+
* Comply with CRAN submission policy (#3660, #3728)
86
+
* Remove copy-paste error in JVM test suite (#3692)
87
+
* Disable flaky tests in `R-package/tests/testthat/test_update.R` (#3723)
88
+
* Make Python tests compatible with scikit-learn 0.20 release (#3731)
89
+
* Separate out restricted and unrestricted tasks, so that pull requests don't build downloadable artifacts (#3736)
90
+
* Add multi-GPU unit test environment (#3741)
91
+
* Allow plug-ins to be built by CMake (#3752)
92
+
* Test wheel compatibility on CPU containers for pull requests (#3762)
93
+
* Fix broken doc build due to Matplotlib 3.0 release (#3764)
94
+
* Produce `xgboost.so` for XGBoost-R on Mac OSX, so that `make install` works (#3767)
95
+
* Retry Jenkins CI tests up to 3 times to improve reliability (#3769, #3769, #3775, #3776, #3777)
96
+
* Add basic unit tests for `gpu_hist` algorithm (#3785)
97
+
* Fix Python environment for distributed unit tests (#3806)
98
+
* Test wheels on CUDA 10.0 container for compatibility (#3838)
99
+
* Fix JVM doc build (#3853)
100
+
101
+
### Maintenance: Refactor C++ code for legibility and maintainability
102
+
* Merge generic device helper functions into `GPUSet` class (#3626)
103
+
* Re-factor column sampling logic into `ColumnSampler` class (#3635, #3637)
104
+
* Replace `std::vector` with `HostDeviceVector` in `MetaInfo` and `SparsePage` (#3446)
105
+
* Simplify `DMatrix` class (#3395)
106
+
* De-duplicate CPU/GPU code using `Transform` class (#3643, #3751)
107
+
* Remove obsoleted `QuantileHistMaker` class (#3761)
108
+
* Remove obsoleted `NoConstraint` class (#3792)
109
+
110
+
### Other Features
111
+
* C++20-compliant Span class for safe pointer indexing (#3548, #3588)
112
+
* Add helper functions to manipulate multiple GPU devices (#3693)
113
+
* XGBoost4J-Spark
114
+
- Allow specifying host ip from the `xgboost-tracker.properties file` (#3833). This comes in handy when `hosts` files doesn't correctly define localhost.
115
+
116
+
### Usability Improvements
117
+
* Add reference to GitHub repository in `pom.xml` of JVM packages (#3589)
118
+
* Add R demo of multi-class classification (#3695)
119
+
* Document JSON dump functionality (#3600, #3603)
120
+
* Document CUDA requirement and lack of external memory for GPU algorithms (#3624)
121
+
* Document LambdaMART objectives, both pairwise and listwise (#3672)
122
+
* Document `aucpr` evaluation metric (#3687)
123
+
* Document gblinear parameters: `feature_selector` and `top_k` (#3780)
124
+
* Add instructions for using MinGW-built XGBoost with Python. (#3774)
125
+
* Removed nonexistent parameter `use_buffer` from documentation (#3610)
126
+
* Update Python API doc to include all classes and members (#3619, #3682)
127
+
* Fix typos and broken links in documentation (#3618, #3640, #3676, #3713, #3759, #3784, #3843, #3852)
128
+
* Binary classification demo should produce LIBSVM with 0-based indexing (#3652)
129
+
* Process data once for Python and CLI examples of learning to rank (#3666)
130
+
* Include full text of Apache 2.0 license in the repository (#3698)
131
+
* Save predictor parameters in model file (#3856)
132
+
* JVM packages
133
+
- Let users specify feature names when calling `getModelDump` and `getFeatureScore` (#3733)
134
+
- Warn the user about the lack of over-the-wire encryption (#3667)
135
+
- Fix errors in examples (#3719)
136
+
- Document choice of trackers (#3831)
137
+
- Document that vanilla Apache Spark is required (#3854)
138
+
* Python package
139
+
- Document that custom objective can't contain colon (:) (#3601)
140
+
- Show a better error message for failed library loading (#3690)
141
+
- Document that feature importance is unavailable for non-tree learners (#3765)
142
+
- Document behavior of `get_fscore()` for zero-importance features (#3763)
143
+
- Recommend pickling as the way to save `XGBClassifier` / `XGBRegressor` / `XGBRanker` (#3829)
144
+
* R package
145
+
- Enlarge variable importance plot to make it more visible (#3820)
146
+
147
+
### BREAKING CHANGES
148
+
* External memory page files have changed, breaking backwards compatibility for temporary storage used during external memory training. This only affects external memory users upgrading their xgboost version - we recommend clearing all `*.page` files before resuming training. Model serialization is unaffected.
149
+
150
+
### Known issues
151
+
* Quantile sketcher fails to produce any quantile for some edge cases (#2943)
152
+
* The `hist` algorithm leaks memory when used with learning rate decay callback (#3579)
153
+
* Using custom evaluation funciton together with early stopping causes assertion failure in XGBoost4J-Spark (#3595)
154
+
* Early stopping doesn't work with `gblinear` learner (#3789)
155
+
* Label and weight vectors are not reshared upon the change in number of GPUs (#3794). To get around this issue, delete the `DMatrix` object and re-load.
156
+
* The `DMatrix` Python objects are initialized with incorrect values when given array slices (#3841)
157
+
* The `gpu_id` parameter is broken and not yet properly supported (#3850)
158
+
159
+
### Acknowledgement
160
+
**Contributors** (in no particular order): Hyunsu Cho (@hcho3), Jiaming Yuan (@trivialfis), Nan Zhu (@CodingCat), Rory Mitchell (@RAMitchell), Andy Adinets (@canonizer), Vadim Khotilovich (@khotilov), Sergei Lebedev (@superbobry)
161
+
162
+
**First-time Contributors** (in no particular order): Matthew Tovbin (@tovbinm), Jakob Richter (@jakob-r), Grace Lam (@grace-lam), Grant W Schneider (@grantschneider), Andrew Thia (@BlueTea88), Sergei Chipiga (@schipiga), Joseph Bradley (@jkbradley), Chen Qin (@chenqin), Jerry Lin (@linjer), Dmitriy Rybalko (@rdtft), Michael Mui (@mmui), Takahiro Kojima (@515hikaru), Bruce Zhao (@BruceZhaoR), Wei Tian (@weitian), Saumya Bhatnagar (@Sam1301), Juzer Shakir (@JuzerShakir), Zhao Hang (@cleghom), Jonathan Friedman (@jontonsoup), Bruno Tremblay (@meztez), @Shiki-H, @mrgutkun, @gorogm, @htgeis, @jakehoare, @zengxy, @KOLANICH
163
+
164
+
**First-time Reviewers** (in no particular order): Nikita Titov (@StrikerRUS), Xiangrui Meng (@mengxr), Nirmal Borah (@Nirmal-Neel)
165
+
9
166
10
167
## v0.80 (2018.08.13)
11
168
***JVM packages received a major upgrade**: To consolidate the APIs and improve the user experience, we refactored the design of XGBoost4J-Spark in a significant manner. (#3387)
0 commit comments