-
Notifications
You must be signed in to change notification settings - Fork 6
ImplementationNotes
For classification and regression, the independent features are implemented as a Mallet FeatureVector object. Attribute names as generated by the FeatureExtraction class are mapped to indices in the feature vector using the data alphabet of the pipe.
FeatureVector instances always use sparse, non-binary representation. This means that all values which are zero are not actually stored in the instance, instead the instance keeps track of how many locations are actually used and maps location numbers to indices.
To get all the non-zero values of a feature vector and their indices (sparse representation):
FeatureVector fv = (FeatureVector)instance.getData();
for(int loc=0;loc<fv.numLocations(); loc++) {
int index = fv.indexAtLocation(loc);
double value = valueAtLocation(loc);
}
To get all values of the vector:
int nrFeatures = pipe.getDataAlphabet().size();
FeatureVector fv = (FeatureVector)instance.getData();
for(int index=0; index<nrFeatures; index++) {
double value = fv.value(index);
}
Notes:
- Sparse FeatureVector objects do not know about the "true" size of the sparse vector.
- FeatureVector.location(index) returns the location of the index-th dimension if non-zero and -1 for zero (non-stored) locations.
- FeatureVector.value(index) returns the value of that index, 0.0 if not any non-stored location (irrespective of true size)
- FeatureVector.valueAtLocation(location) returns the value at that location or throws an exception if location does not exist
We distinguish two tasks: classification and regression: for classification, the target alphabet will be an instance of a LabelAlphabet, for regression it will be null;
For classification, the target of each instance is:
- a String for ordinary classification
- an instance of NominalTargetWithCosts for classification where we have a cost vector for each instance
- a Double for regression
For classification, to get the actual String label of an instance:
LabelAlphabet la = (LabelAlphabet)pipe.getTargetAlphabet();
Object target = instance.getTarget();
Label l = la.getLabel(target);
// if this is ordinary classification, the entry for the label should be a String
String targetString = (String)l.getEntry();
// if this is classification with per-instance cost vectors, the entry for the label is a NominalTargetWithCosts instance:
NominalTargetWithCosts ntwc = (NominalTargetWithCosts)l.getEntry();
String targetString = ntwc.getClassLabel();
double[] costs = ntwc.getCosts();
Brought to you by the GATE team