Skip to content
This repository was archived by the owner on Feb 16, 2024. It is now read-only.

Output Probabilities

Benoit Favre edited this page Mar 14, 2014 · 1 revision

How do I get probabilities out of icsiboost

icsiboost does not really generate anything near probabilities. There is a paper that studied the question: Obtaining calibrated probabilities from boosting by Niculescu-Mizil and Caruana.

They advise three solutions:

  • Logistic Correction
  • Platt Calibration
  • Isotonic regression

The first one consists in transforming the scores using this formula: 1/(1+exp(-2*n*score)), where n is the number of weak learners.

It is implemented in icsiboost through the --posteriors option. For instance, on the adult dataset, it results in:

icsiboost -S adult -C --posteriors < adult.test | head
0 1 0.000676516588 0.999323483412
0 1 0.142914079015 0.857085920985
1 0 0.346835918704 0.653164081296
1 0 0.996016904305 0.003983095695
0 1 0.000004176785 0.999995823215
0 1 0.003001997215 0.996998002785
0 1 0.014896068044 0.985103931956
1 0 0.788795652673 0.211204347327
0 1 0.003583447587 0.996416552413
0 1 0.060653451950 0.939346548050

Note that while these scores are between 0 and 1, they are not guaranteed to sum to 1 over all classes (when you have more than 2 classes), so you should normalize them for each example.

Platt Calibration and Logistic Regression work better in some cases (skewed label prior...). It's also possible to get good results by just moving the decision boundary using a development set (for instance with the --max-fmeasure <label> --optimal-iterations options or with the optimal_threshold.pl script).

Clone this wiki locally