Skip to content
Johann Petrak edited this page May 29, 2016 · 28 revisions

GATE

GATE Learning Framework Plugin

[NOTE: the documentation is still work in progress!!]

The Learning Framework is GATE's most recent machine learning plugin. It's still under active development, and undergoing some flux still, but stable enough to use. It offers a wider variety of more up to date ML algorithms than earlier machine learning plugins, currently supporting various Mallet classification algorithms, Mallet's CRF implementation and LibSVM. In addition Weka classification and regression algorithm can be used by running Weka externally using the weka-wrapper tool (see Using Weka).

It offers broadly the same functionality as the Batch Learning PR, with some differences--in addition to providing a broader range of algorithms, it is likely to be faster to train and apply under most circumstances, export to sparse ARFF format is included, and the interface design is a little different, offering more settings in the form of runtime parameters, and supporting multiple trained models in a more user-friendly way.

The Learning Framework implements different task modes:

  • Classification, which simply assigns a class to each instance annotation. For example, each sentence might be classified as having positive or negative sentiment, each word may get assigned a part-of-speech tag, or a document may be classified as being relevant to some topic or not. With classification, the parts of text are known in advance and assigned one out of several possible class labels.
  • Sequence tagging, also called Chunking, which finds mentions, such as locations or persons, within the text, i.e. the relevant parts of text are not known in advanced but the task is to find them.
  • Regression, which assigns a numerical target, and might be used to rank disambiguation candidates, for example. This is similar to classification in that the relevant parts of text (sentences, words, ...) are known in advance, but instead of a nominal class label, a numeric value is assigned to those parts.

These are provided in separate processing resources (PRs), with separate PRs for training and application and evaluation plugins for classification and regression. The plugin also includes an export PR, allowing GATE to be used to prepare feature files from textual data that can then be exported and used outside of GATE.

Get started here!

Processing Resources:

Other important documentation pages:

  • UsingWeka all about how to use Weka with the LearningFramework plugin.
  • FeatureSpecification all about the feature specification file and what it can contain
  • FeatureConversion more details on how machine learning features are created from the original document annotations
  • AlgorithmParameters some general notes about algorithm parameters. Most parameters are documented with the wiki page about the PR where they can be used
  • SavedFiles the files that get saved as a result of training or exporting
  • API how to use the LearningFramework classes from Java/Scala/Groovy code
  • FAQs
Clone this wiki locally