Skip to content

Commit 8070ffa

Browse files
committed
improve EBM default parameters
1 parent b17f9b5 commit 8070ffa

File tree

3 files changed

+52
-50
lines changed

3 files changed

+52
-50
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -623,6 +623,7 @@ We also build on top of many great packages. Please check them out!
623623
# Papers that use or compare EBMs
624624

625625
- [Challenging the Performance-Interpretability Trade-off: An Evaluation of Interpretable Machine Learning Models](https://arxiv.org/pdf/2409.14429)
626+
- [GAMFORMER: In-context Learning for Generalized Additive Models](https://arxiv.org/pdf/2410.04560v1)
626627
- [Data Science with LLMs and Interpretable Models](https://arxiv.org/pdf/2402.14474v1.pdf)
627628
- [DimVis: Interpreting Visual Clusters in Dimensionality Reduction With Explainable Boosting Machine](https://arxiv.org/pdf/2402.06885.pdf)
628629
- [Distill knowledge of additive tree models into generalized linear models](https://detralytics.com/wp-content/uploads/2023/10/Detra-Note_Additive-tree-ensembles.pdf)
@@ -688,6 +689,7 @@ We also build on top of many great packages. Please check them out!
688689
- [Explainable Boosting Machines for Slope Failure Spatial Predictive Modeling](https://www.mdpi.com/2072-4292/13/24/4991/htm)
689690
- [Micromodels for Efficient, Explainable, and Reusable Systems: A Case Study on Mental Health](https://arxiv.org/pdf/2109.13770.pdf)
690691
- [Identifying main and interaction effects of risk factors to predict intensive care admission in patients hospitalized with COVID-19](https://www.medrxiv.org/content/10.1101/2020.06.30.20143651v1.full.pdf)
692+
- [Leveraging interpretable machine learning in intensive care](https://link.springer.com/article/10.1007/s10479-024-06226-8#Tab10)
691693
- [Development of prediction models for one-year brain tumour survival using machine learning: a comparison of accuracy and interpretability](https://www.pure.ed.ac.uk/ws/portalfiles/portal/343114800/1_s2.0_S0169260723001487_main.pdf)
692694
- [Using Interpretable Machine Learning to Predict Maternal and Fetal Outcomes](https://arxiv.org/pdf/2207.05322.pdf)
693695
- [Calibrate: Interactive Analysis of Probabilistic Model Output](https://arxiv.org/pdf/2207.13770.pdf)

docs/interpret/hyperparameters.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,19 @@ The parameters below are ordered by tuning importance, with the most important h
66

77

88
## smoothing_rounds
9-
default: 200
9+
default: 100
1010

11-
hyperparameters: [0, 50, 100, 200, 500, 1000, 2000, 4000]
11+
hyperparameters: [0, 50, 100, 200, 500, 1000]
1212

1313
guidance: This is an important hyperparameter to tune. The optimal smoothing_rounds value will vary depending on the dataset's characteristics. Adjust based on the prevalence of smooth feature response curves.
1414

15+
## learning_rate
16+
default: 0.01
17+
18+
hyperparameters: [0.2, 0.1, 0.05, 0.025, 0.01, 0.005, 0.0025]
19+
20+
guidance: This is an important hyperparameter to tune. The conventional wisdom is that a lower learning rate is generally better, but we have found the relationship to be more complex. In general, regression seems to prefer a higher learning rate, binary classification seems to prefer a lower learning rate, and multiclass is in-between.
21+
1522
## interactions
1623
default: 0.9
1724

@@ -45,18 +52,18 @@ hyperparameters: [8, 16, 32, 64, 128, 256]
4552
guidance: For max_interaction_bins, more is not necessarily better, unlike with max_bins. A good value on many datasets seems to be 32, but it's worth trying higher and lower values.
4653

4754
## greedy_ratio
48-
default: 1.5
55+
default: 12.0
4956

50-
hyperparameters: [0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 4.0]
57+
hyperparameters: [0.0, 1.0, 2.0, 5.0, 12.0, 20.0]
5158

5259
guidance: greedy_ratio is a good candidate for hyperparameter tuning as the best value is dataset dependent.
5360

5461
## cyclic_progress
55-
default: 1.0
62+
default: 0.0
5663

57-
hyperparameters: [0.0, 0.5, 1.0]
64+
hyperparameters: [0.0, 1.0]
5865

59-
guidance: cyclic_progress is a good candidate for hyperparameter tuning as the best value is dataset dependent.
66+
guidance: Try both.
6067

6168
## outer_bags
6269
default: 14
@@ -74,31 +81,24 @@ hyperparameters: [0, 50, 100, 500]
7481

7582
guidance: interaction_smoothing_rounds appears to have only a minor impact on model accuracy. 0 is often the best choice. 0 is often the most accurate choice, but the interaction shape plots will be smoother and easier to interpret with more interaction_smoothing_rounds.
7683

77-
## learning_rate
78-
default: 0.01
79-
80-
hyperparameters: [0.1, 0.025, 0.01, 0.005, 0.0025]
81-
82-
guidance: A smaller learning_rate promotes finer model adjustments during fitting, but may require more iterations. Generally, we believe a smaller learning_rate should improve the model, but sometimes hyperparameter tuning seems to be needed to select the best value.
83-
8484
## max_leaves
85-
default: 3
85+
default: 2
8686

8787
hyperparameters: [2, 3, 4]
8888

89-
guidance: Generally, the default setting is effective, but it's worth checking if changing to either 2 or 4 can offer better accuracy on your specific data. The max_leaves parameter only applies to main effects.
89+
guidance: Generally, the default setting is effective, but it's worth checking if changing to either 3 or 4 can offer better accuracy on your specific data. The max_leaves parameter only applies to main effects.
9090

9191
## min_samples_leaf
92-
default: 2
92+
default: 4
9393

94-
hyperparameters: [2, 3, 4]
94+
hyperparameters: [2, 3, 4, 5, 6]
9595

9696
guidance: The default value usually works well, however experimenting with slightly higher values could potentially enhance generalization on certain datasets.
9797

9898
## min_hessian
99-
default: 0.0001
99+
default: 1e-5
100100

101-
hyperparameters: [0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001]
101+
hyperparameters: [1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8]
102102

103103
guidance: The default min_hessian is a solid starting point.
104104

@@ -112,7 +112,7 @@ hyperparameters: [1000000000]
112112
guidance: The max_rounds parameter serves as a limit to prevent excessive training on datasets where improvements taper off. Set this parameter sufficiently high to avoid premature early stopping. Consider increasing it if small yet consistent gains are observed in longer trainings.
113113

114114
## early_stopping_rounds
115-
default: 50
115+
default: 100
116116

117117
guidance: We typically do not advise changing early_stopping_rounds. The default is appropriate for most cases, adequately capturing the optimal model without incurring unnecessary computational costs.
118118

python/interpret-core/interpret/glassbox/_ebm/_ebm.py

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -396,7 +396,7 @@ def fit(self, X, y, sample_weight=None, bags=None, init_score=None):
396396
# with 64 bytes per tensor cell, a 2^20 tensor would be 1/16 gigabyte.
397397
max_cardinality = 1048576
398398
nominal_smoothing = True
399-
# In the future we might replace min_samples_leaf=2 with min_samples_bin=3 so
399+
# In the future we might replace min_samples_leaf=4 with min_samples_bin=3 so
400400
# that we don't need to have the count when boosting or for interaction
401401
# detection. Benchmarking indicates switching these would decrease the accuracy
402402
# slightly, but it might be worth the speedup. Unfortunately, with outer bags
@@ -2448,10 +2448,10 @@ class ExplainableBoostingClassifier(EBMModel, ClassifierMixin, ExplainerMixin):
24482448
Number of inner bags. 0 turns off inner bagging.
24492449
learning_rate : float, default=0.01
24502450
Learning rate for boosting.
2451-
greedy_ratio : float, default=1.5
2451+
greedy_ratio : float, default=12.0
24522452
The proportion of greedy boosting steps relative to cyclic boosting steps.
24532453
A value of 0 disables greedy boosting, effectively turning it off.
2454-
cyclic_progress : bool or float, default=True
2454+
cyclic_progress : bool or float, default=False
24552455
This parameter specifies the proportion of the boosting cycles that will
24562456
actively contribute to improving the model's performance. It is expressed
24572457
as a bool or float between 0 and 1, with the default set to True(1.0), meaning 100% of
@@ -2460,13 +2460,13 @@ class ExplainableBoostingClassifier(EBMModel, ClassifierMixin, ExplainerMixin):
24602460
it will be used to update internal gain calculations related to how effective
24612461
each feature is in predicting the target variable. Setting this parameter
24622462
to a value less than 1.0 can be useful for preventing overfitting.
2463-
smoothing_rounds : int, default=200
2463+
smoothing_rounds : int, default=100
24642464
Number of initial highly regularized rounds to set the basic shape of the main effect feature graphs.
24652465
interaction_smoothing_rounds : int, default=50
24662466
Number of initial highly regularized rounds to set the basic shape of the interaction effect feature graphs during fitting.
24672467
max_rounds : int, default=25000
24682468
Total number of boosting rounds with n_terms boosting steps per round.
2469-
early_stopping_rounds : int, default=50
2469+
early_stopping_rounds : int, default=100
24702470
Number of rounds with no improvement to trigger early stopping. 0 turns off
24712471
early stopping and boosting will occur for exactly max_rounds.
24722472
early_stopping_tolerance : float, default=1e-5
@@ -2485,17 +2485,17 @@ class ExplainableBoostingClassifier(EBMModel, ClassifierMixin, ExplainerMixin):
24852485
tradeoff for the ensemble of models --- not the individual models --- a small
24862486
amount of overfitting of the individual models can improve the accuracy of
24872487
the ensemble as a whole.
2488-
min_samples_leaf : int, default=2
2488+
min_samples_leaf : int, default=4
24892489
Minimum number of samples allowed in the leaves.
2490-
min_hessian : float, default=1e-4
2490+
min_hessian : float, default=1e-5
24912491
Minimum hessian required to consider a potential split valid.
24922492
reg_alpha : float, default=0.0
24932493
L1 regularization.
24942494
reg_lambda : float, default=0.0
24952495
L2 regularization.
24962496
max_delta_step : float, default=0.0
24972497
Used to limit the max output of tree leaves. <=0.0 means no constraint.
2498-
max_leaves : int, default=3
2498+
max_leaves : int, default=2
24992499
Maximum number of leaves allowed in each tree.
25002500
monotone_constraints: list of int, default=None
25012501
@@ -2641,20 +2641,20 @@ def __init__(
26412641
inner_bags: Optional[int] = 0,
26422642
# Boosting
26432643
learning_rate: float = 0.01,
2644-
greedy_ratio: Optional[float] = 1.5,
2645-
cyclic_progress: Union[bool, float, int] = True, # noqa: PYI041
2646-
smoothing_rounds: Optional[int] = 200,
2644+
greedy_ratio: Optional[float] = 12.0,
2645+
cyclic_progress: Union[bool, float, int] = False, # noqa: PYI041
2646+
smoothing_rounds: Optional[int] = 100,
26472647
interaction_smoothing_rounds: Optional[int] = 50,
26482648
max_rounds: Optional[int] = 25000,
2649-
early_stopping_rounds: Optional[int] = 50,
2649+
early_stopping_rounds: Optional[int] = 100,
26502650
early_stopping_tolerance: Optional[float] = 1e-5,
26512651
# Trees
2652-
min_samples_leaf: Optional[int] = 2,
2653-
min_hessian: Optional[float] = 1e-4,
2652+
min_samples_leaf: Optional[int] = 4,
2653+
min_hessian: Optional[float] = 1e-5,
26542654
reg_alpha: Optional[float] = 0.0,
26552655
reg_lambda: Optional[float] = 0.0,
26562656
max_delta_step: Optional[float] = 0.0,
2657-
max_leaves: int = 3,
2657+
max_leaves: int = 2,
26582658
monotone_constraints: Optional[Sequence[int]] = None,
26592659
objective: str = "log_loss",
26602660
# Overall
@@ -2794,10 +2794,10 @@ class ExplainableBoostingRegressor(EBMModel, RegressorMixin, ExplainerMixin):
27942794
Number of inner bags. 0 turns off inner bagging.
27952795
learning_rate : float, default=0.01
27962796
Learning rate for boosting.
2797-
greedy_ratio : float, default=1.5
2797+
greedy_ratio : float, default=12.0
27982798
The proportion of greedy boosting steps relative to cyclic boosting steps.
27992799
A value of 0 disables greedy boosting, effectively turning it off.
2800-
cyclic_progress : bool or float, default=True
2800+
cyclic_progress : bool or float, default=False
28012801
This parameter specifies the proportion of the boosting cycles that will
28022802
actively contribute to improving the model's performance. It is expressed
28032803
as a bool or float between 0 and 1, with the default set to True(1.0), meaning 100% of
@@ -2806,13 +2806,13 @@ class ExplainableBoostingRegressor(EBMModel, RegressorMixin, ExplainerMixin):
28062806
it will be used to update internal gain calculations related to how effective
28072807
each feature is in predicting the target variable. Setting this parameter
28082808
to a value less than 1.0 can be useful for preventing overfitting.
2809-
smoothing_rounds : int, default=200
2809+
smoothing_rounds : int, default=100
28102810
Number of initial highly regularized rounds to set the basic shape of the main effect feature graphs.
28112811
interaction_smoothing_rounds : int, default=50
28122812
Number of initial highly regularized rounds to set the basic shape of the interaction effect feature graphs during fitting.
28132813
max_rounds : int, default=25000
28142814
Total number of boosting rounds with n_terms boosting steps per round.
2815-
early_stopping_rounds : int, default=50
2815+
early_stopping_rounds : int, default=100
28162816
Number of rounds with no improvement to trigger early stopping. 0 turns off
28172817
early stopping and boosting will occur for exactly max_rounds.
28182818
early_stopping_tolerance : float, default=1e-5
@@ -2831,17 +2831,17 @@ class ExplainableBoostingRegressor(EBMModel, RegressorMixin, ExplainerMixin):
28312831
tradeoff for the ensemble of models --- not the individual models --- a small
28322832
amount of overfitting of the individual models can improve the accuracy of
28332833
the ensemble as a whole.
2834-
min_samples_leaf : int, default=2
2834+
min_samples_leaf : int, default=4
28352835
Minimum number of samples allowed in the leaves.
2836-
min_hessian : float, default=1e-4
2836+
min_hessian : float, default=1e-5
28372837
Minimum hessian required to consider a potential split valid.
28382838
reg_alpha : float, default=0.0
28392839
L1 regularization.
28402840
reg_lambda : float, default=0.0
28412841
L2 regularization.
28422842
max_delta_step : float, default=0.0
28432843
Used to limit the max output of tree leaves. <=0.0 means no constraint.
2844-
max_leaves : int, default=3
2844+
max_leaves : int, default=2
28452845
Maximum number of leaves allowed in each tree.
28462846
monotone_constraints: list of int, default=None
28472847
@@ -2987,20 +2987,20 @@ def __init__(
29872987
inner_bags: Optional[int] = 0,
29882988
# Boosting
29892989
learning_rate: float = 0.01,
2990-
greedy_ratio: Optional[float] = 1.5,
2991-
cyclic_progress: Union[bool, float, int] = True, # noqa: PYI041
2992-
smoothing_rounds: Optional[int] = 200,
2990+
greedy_ratio: Optional[float] = 12.0,
2991+
cyclic_progress: Union[bool, float, int] = False, # noqa: PYI041
2992+
smoothing_rounds: Optional[int] = 100,
29932993
interaction_smoothing_rounds: Optional[int] = 50,
29942994
max_rounds: Optional[int] = 25000,
2995-
early_stopping_rounds: Optional[int] = 50,
2995+
early_stopping_rounds: Optional[int] = 100,
29962996
early_stopping_tolerance: Optional[float] = 1e-5,
29972997
# Trees
2998-
min_samples_leaf: Optional[int] = 2,
2999-
min_hessian: Optional[float] = 1e-4,
2998+
min_samples_leaf: Optional[int] = 4,
2999+
min_hessian: Optional[float] = 1e-5,
30003000
reg_alpha: Optional[float] = 0.0,
30013001
reg_lambda: Optional[float] = 0.0,
30023002
max_delta_step: Optional[float] = 0.0,
3003-
max_leaves: int = 3,
3003+
max_leaves: int = 2,
30043004
monotone_constraints: Optional[Sequence[int]] = None,
30053005
objective: str = "rmse",
30063006
# Overall

0 commit comments

Comments
 (0)