Skip to content

Commit 8717a15

Browse files
Lerajnothman
authored andcommitted
[MRG] Extended explanation of using class_weight in RandomForestClassifier (Issue scikit-learn#6646) (scikit-learn#8838)
* Extended explanation of using class_weight in RandomForestClassifier * Extended explanation of using class_weight in DecisionTreeClassifier,ExtraTreesClassifier and compute_sample_weight() * Rephrased description. * Rephrased description (remove "indicator")
1 parent c2b0de5 commit 8717a15

File tree

3 files changed

+24
-0
lines changed

3 files changed

+24
-0
lines changed

sklearn/ensemble/forest.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -863,6 +863,12 @@ class RandomForestClassifier(ForestClassifier):
863863
multi-output problems, a list of dicts can be provided in the same
864864
order as the columns of y.
865865
866+
Note that for multioutput (including multilabel) weights should be
867+
defined for each class of every column in its own dict. For example,
868+
for four-class multilabel classification weights should be
869+
[{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
870+
[{1:1}, {2:5}, {3:1}, {4:1}].
871+
866872
The "balanced" mode uses the values of y to automatically adjust
867873
weights inversely proportional to class frequencies in the input data
868874
as ``n_samples / (n_classes * np.bincount(y))``
@@ -1306,6 +1312,12 @@ class ExtraTreesClassifier(ForestClassifier):
13061312
multi-output problems, a list of dicts can be provided in the same
13071313
order as the columns of y.
13081314
1315+
Note that for multioutput (including multilabel) weights should be
1316+
defined for each class of every column in its own dict. For example,
1317+
for four-class multilabel classification weights should be
1318+
[{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
1319+
[{1:1}, {2:5}, {3:1}, {4:1}].
1320+
13091321
The "balanced" mode uses the values of y to automatically adjust
13101322
weights inversely proportional to class frequencies in the input data
13111323
as ``n_samples / (n_classes * np.bincount(y))``

sklearn/tree/tree.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -587,6 +587,12 @@ class DecisionTreeClassifier(BaseDecisionTree, ClassifierMixin):
587587
multi-output problems, a list of dicts can be provided in the same
588588
order as the columns of y.
589589
590+
Note that for multioutput (including multilabel) weights should be
591+
defined for each class of every column in its own dict. For example,
592+
for four-class multilabel classification weights should be
593+
[{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
594+
[{1:1}, {2:5}, {3:1}, {4:1}].
595+
590596
The "balanced" mode uses the values of y to automatically adjust
591597
weights inversely proportional to class frequencies in the input data
592598
as ``n_samples / (n_classes * np.bincount(y))``

sklearn/utils/class_weight.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,12 @@ def compute_sample_weight(class_weight, y, indices=None):
8484
multi-output problems, a list of dicts can be provided in the same
8585
order as the columns of y.
8686
87+
Note that for multioutput (including multilabel) weights should be
88+
defined for each class of every column in its own dict. For example,
89+
for four-class multilabel classification weights should be
90+
[{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
91+
[{1:1}, {2:5}, {3:1}, {4:1}].
92+
8793
The "balanced" mode uses the values of y to automatically adjust
8894
weights inversely proportional to class frequencies in the input data:
8995
``n_samples / (n_classes * np.bincount(y))``.

0 commit comments

Comments
 (0)