ItsdbTreebanking_ItsdbModeling

Training a Scoring Model

If you have treebanked a profile, and have Rob Malouf's [http://tadm.sourceforge.net/ TADM: Toolkit for Advanced Discriminative Modeling], in particular the program tadm installed, then you can train a scoring model which PET (PetTop) can use. This used to be called estimate from [http://bulba.sdsu.edu/malouf/software/maxent.tar.gz Maxent].

Select the treebanked profile (left-click), or profiles (click in the radio buttons) and then select Trees | Train from the menus. It will prompt you for the filename to put the scoring model in. The tradtion is something like corpus-version.mem. You should have the grammar used for treebanking loaded into the LKB (LkbTop). Training is normally fairly fast.

Scoring

You can compare the ranking of a given profile with a treebanked gold standard (assuming the same test-suite and grammar). The ranking can be changed by changing the scoring model in the parser.

To compare: select the gold standard (middle click), then the profile to be scored as the current database (left click); (make sure the current version of the grammar is loaded into the LKB).

 ;; score results in .data. against ground truth in .gold.  operates in
 ;; several slightly distinct modes: (i) using the implicit parse ranking in
 ;; the order of `results' or (ii) using an explicit ranking from the `score'
 ;; relation an orthogonal dimension of variation is (a) scoring by result
 ;; identifier (e.g. within the same profile or against one that is comprised
 ;; of identical results) vs. (b) scoring by derivation equivalence (e.g.
 ;; when comparing best-first parser output against a gold standard).

In order to make the scoring faster, you should do a thinning normalize on the gold profile for comparison first. This thins (implicitly) to only those trees marked as good by the annotator, i.e. you thin out all dis-preferred trees. To get a 5-best comparison, play with the Scoring Beam value.

Using a Scoring Model in PET and the LKB

Scoring in PET

The scoring model is referenced in cheap's grammar.set for PET:

;;; scoring mechanism (fairly embryonic, for now)
sm := "hinoki.mem".

Scoring in the LKB

The scoring model is referenced in the script and globals.lsp for the LKB:

script

;;; if you have [incr tsdb()], load a Maximum Entropy parse selection model
#+:tsdb
(tsdb::read-model (lkb-pathname (parent-directory) "hinoki.mem"))

globals.lsp

;;; use the parse selection model for selective unpacking
#+:tsdb
(setf *unpacking-scoring-hook* #'tsdb::mem-score-configuration)

Calculating a Baseline

You can caluclate the baseline for a profile (the probability of a random parse being correct) as follows:

(tsdb::baseline "profile-name")
(0.18341358 1.4064516 104.56272 1395)

The four numbers are the baseline itself, the average number of selected trees, ??? and the number of items considered. The default condition is readings > 1 && t-active >= 1, that is, all ambiguous parses that have been reduced. You can add extra conditions, for example to only consider items where the results are resolved to a single parse:

(tsdb::baseline "profile-name" :condition "t-active = 1")
(0.20147601 1.0 64.88867 1015)

Home | Forum | Discussions | Events

ItsdbTreebanking_ItsdbModeling

Training a Scoring Model

Scoring

Using a Scoring Model in PET and the LKB

Scoring in PET

Scoring in the LKB

Calculating a Baseline

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!