-
Notifications
You must be signed in to change notification settings - Fork 4
ItsdbTreebanking_ItsdbModeling
If you have treebanked a profile, and have Rob Malouf's [http://tadm.sourceforge.net/ TADM: Toolkit for Advanced Discriminative Modeling], in particular the program tadm installed, then you can train a scoring model which PET (PetTop) can use. This used to be called estimate from [http://bulba.sdsu.edu/malouf/software/maxent.tar.gz Maxent].
Select the treebanked profile (left-click), or profiles (click in the radio buttons) and then select Trees | Train from the menus. It will prompt you for the filename to put the scoring model in. The tradtion is something like corpus-version.mem. You should have the grammar used for treebanking loaded into the LKB (LkbTop). Training is normally fairly fast.
You can compare the ranking of a given profile with a treebanked gold standard (assuming the same test-suite and grammar). The ranking can be changed by changing the scoring model in the parser.
To compare: select the gold standard (middle click), then the profile to be scored as the current database (left click); (make sure the current version of the grammar is loaded into the LKB).
Set: Trees | Switches | Implicit Ranks and Trees | Switches | Result Equivalence; and then go Trees | Score.
;; score results in .data. against ground truth in .gold. operates in
;; several slightly distinct modes: (i) using the implicit parse ranking in
;; the order of `results' or (ii) using an explicit ranking from the `score'
;; relation an orthogonal dimension of variation is (a) scoring by result
;; identifier (e.g. within the same profile or against one that is comprised
;; of identical results) vs. (b) scoring by derivation equivalence (e.g.
;; when comparing best-first parser output against a gold standard).
In order to make the scoring faster, you should do a thinning normalize on the gold profile for comparison first. This thins (implicitly) to only those trees marked as good by the annotator, i.e. you thin out all dis-preferred trees. To get a 5-best comparison, play with the Scoring Beam value.
The scoring model is referenced in cheap's grammar.set for PET:
;;; scoring mechanism (fairly embryonic, for now)
sm := "hinoki.mem".
The scoring model is referenced in the script and globals.lsp for the LKB:
script
;;; if you have [incr tsdb()], load a Maximum Entropy parse selection model
#+:tsdb
(tsdb::read-model (lkb-pathname (parent-directory) "hinoki.mem"))
globals.lsp
;;; use the parse selection model for selective unpacking
#+:tsdb
(setf *unpacking-scoring-hook* #'tsdb::mem-score-configuration)
You can caluclate the baseline for a profile (the probability of a random parse being correct) as follows:
(tsdb::baseline "profile-name")
(0.18341358 1.4064516 104.56272 1395)
The four numbers are the baseline itself, the average number of selected trees, ??? and the number of items considered. The default condition is readings > 1 && t-active >= 1, that is, all ambiguous parses that have been reduced. You can add extra conditions, for example to only consider items where the results are resolved to a single parse:
(tsdb::baseline "profile-name" :condition "t-active = 1")
(0.20147601 1.0 64.88867 1015)
Home | Forum | Discussions | Events