-
Notifications
You must be signed in to change notification settings - Fork 18
Output Probabilities
icsiboost does not really generate anything near probabilities. There is a paper that studied the question: Obtaining calibrated probabilities from boosting by Niculescu-Mizil and Caruana.
They advise three solutions:
- Logistic Correction
- Platt Calibration
- Isotonic regression
The first one consists in transforming the scores using this formula:
1/(1+exp(-2*n*score))
, where n
is the number of weak learners.
It is implemented in icsiboost through the --posteriors
option. For instance, on the adult dataset, it results in:
icsiboost -S adult -C --posteriors < adult.test | head
0 1 0.000676516588 0.999323483412
0 1 0.142914079015 0.857085920985
1 0 0.346835918704 0.653164081296
1 0 0.996016904305 0.003983095695
0 1 0.000004176785 0.999995823215
0 1 0.003001997215 0.996998002785
0 1 0.014896068044 0.985103931956
1 0 0.788795652673 0.211204347327
0 1 0.003583447587 0.996416552413
0 1 0.060653451950 0.939346548050
Note that while these scores are between 0 and 1, they are not guaranteed to sum to 1 over all classes (when you have more than 2 classes), so you should normalize them for each example.
Platt Calibration and Logistic Regression work better in some cases (skewed label prior...). It's also possible to get good results by just moving the decision boundary using a development set (for instance with the --max-fmeasure <label>
--optimal-iterations options or with the optimal_threshold.pl script).