- Author: Vaux Gomes
- Contact: vauxgomes@gmail.com
- Version: 0.1
Here we have implementations of LAC, Adaboost, a version of Conf-Rated Adaboost and SLIPPER algorithms. To mine the association rules one can use either d-peeler, multidupehack and lcm softwares without having to adapt the code. However, it is not very complicated adapting the code.
LAC (Lazy Associative Classification) is a rule-based demand-driven lazy machine learning algorithm. For each test instance, the algorithm projects the data at the region that the test instance is. As effect, the algorithm decomposes the problem of fitting a unique function that explains the whole date in many smaller problems. In deed, there is a possibility that not all the regions of data points will be explored by the algorithm for a given test set. LAC predicts the class of a test instance by averaging the confidence value of the induced rules and taking a majority vote among the rules classes.
Boosting is a method for improving the accuracy of machine learning algorithms, used for combining classifier by assigning them voting influence values (or simply, weights). Essentially, boosting builds an additive model by iteratively combining many classifiers, so called weak hypotheses all generated by a base learner.
The conf-rated Adaboost algorithm is an adaptation of the original discrete Adaboost that allow classifiers to give a notion of certainty on their predicions.
SLIPPER is a rule based algoritm that uses Adaboost's method to build sets of rules.
-
sTraininig set files (required) -
tTesting set file (required) -
iItemsets file (required) -
bMaximum number of rounds -
zOriginal sizes of each class -
wPre-set weights -
ZZERO Adaboost -
AAssociative Classifier -
DDiscrete Adaboost -
CConfidence-rated Adaboost -
SSLIPPER Classifier -
-freeUse free itemsets -
-rmodeAssociative Classifier -
-seedRandom objects
There is a files called settings.py in the utils directory. Within that file there are a few variables that can be set:
RANDOM_SEEDSeed for controlling the randoming objectMIN_ROUNDSMinimum of rounds for the Adaboost algorithmsMAX_ROUNDSMaximum of rounds for the Adaboost algorithmsGAMMAGamma value for the Discrete Adaboost algorithms (see Adaboost)kICVNumber of internal cross validations of the Slipper algorithm
# Classes: TMP.class.0 TMP.class.1
# Itemsets: TMP.itemsets
# Test: TMP.testset
$ main.py -s TMP.class.* -t TMP.testset -i TMP.itemsets# Calling Discrete Adaboost routine
$ main.py -s TMP.class.* -t TMP.testset -i TMP.itemsets -Drunner is used to mine rules for given train and test sets and after that call the booster module
-
hDisplays help -
sTraining Files -
tTesting files -
zMINER: Minimum support size -
mMINER: Use Multidupehack to mine the itemsets (default: D-peeler) -
lMINER: Use LCM to mine the itemsets (default: D-peeler) -
eBOOSTER: Activates Eager Mode -
ZBOOSTER: Deactivates Zero classifier -
ABOOSTER: Deactivates Associative Classifier -
DBOOSTER: Activates Discrete Adaboost -
CBOOSTER: Activates Conf-rated Adaboost -
SBOOSTER: Activates SLIPPER Boost -
oBOOSTER: Activates use of original train size -
jBOOSTER: Activates use of Jaccard's index -
bBOOSTER: Maximum number of rounds -
fBOOSTER: Uses only free itemsets
Note: This code works only for luccskdd files.
$ ./runner.sh -s train -t test# Calling Discrete Adaboost in the eager mode using multidupehack
$ ./runner.sh -s train -t test -emDNote: It is interesting if you always call the option f with the option m.
This script runs a battery of datasets using the runner script. See variable array in the script.
hDisplays helpaUse multi and binary class problems datasetsxFiles extention name*mMINER: miner options occordingly with runner script*pPath for result outputslProgress log file (default: .batt)nShow notifications (default: false)
# Calling a battery using options C and D of the main module
$ ./battery.sh -x ext -m "-CD"The input format is formed of a series of rules in the following format:
<int_features> <class>
The LUCS-KDD format fits very well!
The output is formed of a header in the format:
# Mode: <Lazy/Eager>
# Miner: <Multidupehack/D-peeler/Lcm>
# Train: <train file>
# Test: <test file>
#
# Date: Wed Nov 2 01:06:52 BRST 2016
# Host: ubt13z
followed for an empty line and a set of lines white space separated representing the predictions of the algorithms. Each line follows the following format:
<correct_class> ~<alg1_name> <pred1_alg1> ... <predN_alg1> ... ~<algM_name> <pred1_algM> ... <predN_algM>