-
Notifications
You must be signed in to change notification settings - Fork 6
ImplementationNotes
For classification and regression, the independent features are implemented as a Mallet FeatureVector object. Attribute names as generated by the FeatureExtraction class are mapped to indices in the feature vector using the data alphabet of the pipe.
FeatureVector instances always use sparse, non-binary representation. This means that all values which are zero are not actually stored in the instance, instead the instance keeps track of how many locations are actually used and maps location numbers to indices.
To get all the non-zero values of a feature vector and their indices (sparse representation):
FeatureVector fv = (FeatureVector)instance.getData();
for(int loc=0;loc<fv.numLocations(); loc++) {
int index = fv.indexAtLocation(loc);
double value = valueAtLocation(loc);
}
To get all values of the vector:
int nrFeatures = pipe.getDataAlphabet().size();
FeatureVector fv = (FeatureVector)instance.getData();
for(int index=0; index<nrFeatures; index++) {
double value = fv.value(index);
}
We distinguish two tasks: classification and regression: for classification, the target alphabet will be an instance of a LabelAlphabet, for regression it will be null;
For classification, the target of each instance is:
- a String for ordinary classification
- an instance of NominalTargetWithCosts for classification where we have a cost vector for each instance
- a Double for regression
For classification, to get the actual String label of an instance:
LabelAlphabet la = (LabelAlphabet)pipe.getTargetAlphabet();
Object target = instance.getTarget();
Label l = la.getLabel(target);
// if this is ordinary classification, the entry for the label should be a String
String targetString = (String)l.getEntry();
// if this is classification with per-instance cost vectors, the entry for the label is a NominalTargetWithCosts instance:
NominalTargetWithCosts ntwc = (NominalTargetWithCosts)l.getEntry();
String targetString = ntwc.getClassLabel();
double[] costs = ntwc.getCosts();
Brought to you by the GATE team