Skip to content

ImplementationNotes

Johann Petrak edited this page Aug 1, 2016 · 13 revisions

Implementation / Development Notes

Feature vectors

For classification and regression, the independent features are implemented as a Mallet FeatureVector object. Attribute names as generated by the FeatureExtraction class are mapped to indices in the feature vector using the data alphabet of the pipe.

FeatureVector instances always use sparse, non-binary representation. This means that all values which are zero are not actually stored in the instance, instead the instance keeps track of how many locations are actually used and maps location numbers to indices.

To get all the non-zero values of a feature vector and their indices (sparse representation):

FeatureVector fv = (FeatureVector)instance.getData();
for(int loc=0;loc<fv.numLocations(); loc++) {
  int index = fv.indexAtLocation(loc);
  double value = valueAtLocation(loc);
}

To get all values of the vector:

int nrFeatures = pipe.getDataAlphabet().size();
FeatureVector fv = (FeatureVector)instance.getData();
for(int index=0; index<nrFeatures; index++) {
  double value = fv.value(index);
}

Targets

We distinguish two tasks: classification and regression: for classification, the target alphabet will be an instance of a LabelAlphabet, for regression it will be null;

For classification, the target of each instance is:

  • a String for ordinary classification
  • an instance of NominalTargetWithCosts for classification where we have a cost vector for each instance
  • a Double for regression

For classification, to get the actual String label of an instance:

LabelAlphabet la = (LabelAlphabet)pipe.getTargetAlphabet(); 
Object target = instance.getTarget();
Label l = la.getLabel(target);
// if this is ordinary classification, the entry for the label should be a String
String targetString = (String)l.getEntry();
// if this is classification with per-instance cost vectors, the entry for the label is a NominalTargetWithCosts instance:
NominalTargetWithCosts ntwc = (NominalTargetWithCosts)l.getEntry();
String targetString = ntwc.getClassLabel();
double[] costs = ntwc.getCosts();
Clone this wiki locally