The program consists of two modules, a RDF_processor and a model.
It also contains two demonstration scripts, build_model.py, which can be used to create a model from serialized hash arrays or RDF triples and predict.py which provides predictions for a stored model.
The model file contains a prebuilt model which can be used by the predict script
Constructs and stores a set of <subjects> from ident_file with RDF:type == object using the Redland Python bindings.
The format of the RDF file should be:
<subject> RDF:type <object>.
ident_file: Name of the RDF turtle file to be parsed.object: URI of the parsing<object>
Constructs and stores an array of <object> strings for FOAF:Name predicates, and a corresponding identifier array describing the <subjects>'s presence in the stored set of subjects.
The format of the RDF file should be:
<subject> FOAF:Name <object>.
map_file: Name of the RDF turtle file to be parsed & mappedbalance: IfTrue, balances the arrays by downsampling the more prevelant category.
Tokenises the array of object strings and hashes them using mmh3 to create and store a scipy_dok sparse matrix.
mapping_size: The range of the hashes, between [-mapping_size, +mapping_size]
Shuffles the subject, features and identifier arrays.
Returns the current feature array.
Returns the array of identifiers.
Returns the array of subject strings.
size: Number of features in the array to be modelledbatch_size: The maximum size of each batchalpha: The learning rate for batch SGDC: The L2 regularization term
Fits dataset X to target Y by minimizing the logistic cost function using Mini-batch Gradient Descent with L2 regularization.
X: The array to be fitted. Of shape (n_samples, n_features)Y: The target array forX. Of shape (n_samples)
Predicts the value of X using the fitted model.
X: Value to be fitted
Returns the mean successful prediction rate for X against targets Y on the fitted model.
X: The array to be predictedY: The targets to be compared against
- Python 2.7
- numpy
- scipy
- mmh3
- Redland Python bindings
- cPickle