-
Notifications
You must be signed in to change notification settings - Fork 18
Program Usage
This description applies to svn revision r100 and may differ in other revisions.
USAGE: icsiboost [options] -S <stem>
--version print version info
The version informations contain the Subversion tags for Author, Date and Release, the compiler version, the build date and time, and the type of executable (32bit/64bit).
-S <stem> defines model/data/names stem
This option defines a common naming for all input/output files. It is usually the experiment name and refers to stem.names (definition of feature column types), stem.data (training example file, one example per line), stem.dev (independent development example file, used for model selection/hyper-parameter optimization), stem.test (independent test example file, only to compute the error rate), stem.shyp (the model file).
-n <iterations> number of boosting iterations (also limits test time classifiers, if model is not packed)
The number of iterations corresponds to the number of weak learners trained. The higher this number is the lower the training error will be (the classifier gets very accurate on the training set) but also the higher the risk of overtraining (test error reaches a minimum and then rises again).
-E <smoothing> set smoothing value (default=0.5)
--output-weights output training examples weights at each iteration
-V verbose mode
Verbose mode is useful for debugging an experiment with weired results. The bug may come from icsiboost or the data. In verbose, you get more information about the type of the feature columns, the ignored columns and the weak learners at each iterations. This option is helpful in conjunction with --dryrun.
-C classification mode -- reads examples from <stdin>
In classification mode, a previously trained model is read from stem.shyp (and the column definitions are read from stem.names); examples are fed to icsiboost without a label through stdin, classified and output on stdout. By default, classification results are output on lines, one prediction per line. If there are N classes defined in the .names file, each line has N binary fields corresponding to the activation of the reference classes (if available) followed by the score for each class. The order is the same as the declaration of the classes in the .names file. When a score is above zero, the classifier has decided that the class was output, when it is bellow, the class is not output. The absolute value of the score acts as a confidence measure of the prediction. See --posteriors to get probabilities. In the case of multi-label classification, multiple classes can be activated/predicted on a single example.
-o long output in classification mode
The long output is only valid in classification mode and contains for each example the name/value of each feature column, the classification output and scores and a cryptic indicator of erroneous predictions.
-N <text_expert> choose a text expert between fgram, ngram and sgram
When using textual features, the string of words is split on spaces to create a so-called bag-of-words. This means that only individual words are used to train the classifier, independently of their position. To extend the observation range of the classifier, you can use the -N and -W options and create bag-of-ngrams instead of bag-of words. This time, a sequence of words will generate text experts that match up to n words. There are different type experts:
- ngram: n-grams of length 1 to n (ex: the the_sheep the_sheep_eats sheep_eats sheep eats sheep_eats_grass)
- sgram: bigrams with up to n words between them (the_eat sheep_grass)
- fgram: ngrams of length n only (no inferior orders) (the_sheep_eats sheep_eats_grass)
If you want to use position information, change your features to reflect it (ex: the_1 sheep_2 eats_3 grass_4). You can also build n-grams of n-grams by changing your features (ex: the_sheep_eats green_grass). Remember that spaces mean bag-of-whatever. If you want your word strings to be used as binary experts, change spaces to something else (ex: the-sheep-eats-green-grass).
-W <ngram_length> specify window length of text expert
See the -N option.
--dryrun only parse the names file and the data file to check for errors
Useful with -V. This option does not train/save a model.
--cutoff <freq> ignore nominal features occuring unfrequently (shorten training time)
Infrequent words (or more generally textual features) are very unlikely to be relevant to the classification task. They may even hurt the generalization properties of the classifier if they get too much weight in the model. By specifying --cutoff n, you will completely ignore all textual features that occur less than n times. This means that --cutoff 2 ignores words occurring only once. Individual ngrams of the bag-of-ngram features are affected by this cutoff. This option can also help saving memory and training time.
--no-unk-ngrams ignore ngrams that contain the "unk" token
Your feature generation may already deal with infrequent words and replace them by "unk". This option will prevent the creation of ngrams containing "unk". If unk is relativelly frequent, it will have a stronger classification potential than other moderately frequent words which can be beneficial in some applications or harmful if unk has a different distribution at test time (ex: when training on written text, you will probably limit the size of the vocabulary to match the ASR output from speech; but at test time, the ASR does not have unknown words, by definition).
--jobs <threads> number of threaded weak learners
Weak learner training gets parallelized using n threads. This can save time since the search for a weak learner is exhaustive. The gain is less than a factor of N because of some synchronization issues and because error rate computation and weight updates are not parallelized. Of course, loading examples from the data files is not parallelized.
--do-not-pack-model do not pack model (this is the default behavior)
--pack-model pack model (for boostexter compatibility)
Boostexter used to pack its models, meaning two occurrences of the same weak classifier (same threshold or same text expert) are averaged into one classifier. This saves resources but, depending on the classification problem, does not have a big impact. Not packing the model mean that you can change the number of iterations at test time (using the -n option) to match the number of iterations where development error is the lowest (for classification error, you can use the --optimal-iterations switch to do that before saving the model; for other application oriented performance measures like F-Score or AUC, you have to compute it externally).
--posteriors output posterior probabilities instead of boosting scores
Use this switch to convert boosting scores to the posterior probability of each class given the observations. The posterior probabilities can be used as good confidence estimates. The formula uses Logistic Correction: P(C|X)=1/(1+exp(-2nscore)) where n is the number of iterations and score is the classifier output. See http://citeseer.comp.nus.edu.sg/742675.html for more information and some ways to better confidence estimators.
--model <model> save/load the model to/from this file instead of <stem>.shyp
--train <file> bypass the <stem>.data filename to specify training examples
--names <file> use this column description file instead of <stem>.names
--dev <file> bypass the <stem>.dev filename to specify development examples
--test <file> bypass the <stem>.test filename to specify test examples
Alternatively to specifying the file stem with -S, you can override the file names of the model, the training data and the feature definitions. Development and test example files can also be specified this way.
--interruptible save model after each iteration in case of failure/interruption
--resume resume training from a previous model (can use another dataset for adaptation)
Saving the model after each iteration allows you to abort icsiboost without loosing your training steps. If icsiboost is killed while saving the model, then you can still use the previous iteration from stem.shyp.previous. This is also useful when training on large datasets and you need to have an idea of the performance using an external tool. You may specify a very large number of iterations and stop training when the classifier start to overtrain (dev error has reached its minimum). Note that --resume will replay the model on the training data (to get the weights) and then continue training until the number of iterations is reached. Say you stopped at 90 iterations and resume with the number of iterations set to 100, it will only produce 10 additional rounds, not 100. The --resume option can also be used to adapt the model to new domain-specific data (by changing the training set). This sometimes works but, generally, after replaying the model, wrongly classified examples have extremely high weights leading to some fluctuation in the training/test error.
--ignore <columns> ignore a comma separated list of columns (synonym of "ignore" in names file)
--ignore-regex <regex> ignore columns that match a given regex
--only <columns> use only a comma separated list of columns (antonym of "ignore" in names file)
--only-regex <regex> use only columns that match a given regex
There are several ways of changing the features used for training without changing the input examples (for feature selection). You can either set the column type to "ignore" in the .names file, or override it on the command line using a list (--ignore column1,column2,column3) or a regular expression (--ignore-regex ^(prosody|speaker)). It is useful to check the actual columns that you are going to use with --dryrun -V. --only and --only-regex work the opposite of --ignore and --ignore-regex.
--optimal-iterations output the model at the iteration that minimizes dev error
This option keeps track at training of the iteration that minimized classification error on the development set (or the test set if no development set is provided) and then outputs a model corresponding to this number of iterations. This is only useful if your primary performance measure is classification error rate. You can set the number of iterations at test time using -n (if n is larger than the number of weak classifiers available in the model, it is set to that number).
--max-fmeasure <class> display maximum f-measure of specified class instead of error rate
--fmeasure-beta <float> specify weight of recall compared to precision in f-measure
These options compute the maximum f-measure instead of classification error rate. F-measure is the harmonic mean between precision and recall, two one-against-all error rates. It is useful when dealing with classes of interest with a low prior probability. Frequent classes are usually better learned than low-frequency classes because more examples are available (and they contribute more to the error rate). This can be adjusted by selecting a different classification threshold than >0 on the classifier score for a particular class. The --max-measure option replaces the error rate outputs by f-measure outputs (at the best threshold, by trying all thresholds). It also affects the --optimal-iterations option to use maximum f-measure instead of classification error rate. Additionally to the optimal iteration, the related decision threshold is output. To get that threshold in probability score space, use --posteriors. --fmeasure-beta can be used to adjust the weight of recall vs. precision according to application concerns.
--display-maxclass display the classification rate obtained by selecting only one class by exemple, the one that obtains the maximum score (boostexter-compatible)
By default, icsiboost computes error rates based on the sign of the classifier output. In multiclass problems, it means that the classifier can output more than one label (all classes that get a positive score) which may lead to an overestimation of the error rate if you are not in the multilabel setting. You can use the --display-maxclass option to only consider the class with the highest score (the way BoosTexter works when only one label was seen on all examples).
--abstaining-stump use abstain-on-absence text stump (experimental)
--no-unknown-stump use abstain-on-unknown continuous stump (experimental)
--sequence generate column __SEQUENCE_PREVIOUS from previous prediction at test time (experimental)
--anti-prior set initial weights to focus on classes with a lower prior (experimental)
You should not use experimental options unless you know what you are doing. They are work-in-progress or aborted-because-it-did-not-work.