Skip to content

JacyPerformance

FrancisBond edited this page Jul 26, 2007 · 9 revisions

There are several ways to increase the performance of the grammar during parsing and generation.

TableOfContents

Quick Check

Make sure the two quick-check files are kept up to date.

  • LKB: ${JACY}/lkb/checkpaths.lsp PET: ${JACY}/pet/qc.tdl

PET

Make sure:

  • - you are using compatible versions of flop and cheap - your grammar is up-to-date:

See ${JACY}/utils/make-qc.bash

flop japanese.tdl 
mv pet/qc.tdl pet/qc.tdl.old
flop japanese
cut -d@ -f7 tsdb/skeletons/kinou1/item | cheap -limit=100000 -packing -compute-qc=pet/qc.tdl japanese;
flop japanese
  • Using a text file (you need to segment it)
grep -v '#' testsuites/mt-test-set-1.txt | chasen -F "%m " |\ 
cheap -limit=100000 -packing -compute-qc=pet/qc.tdl japanese 

After you have made the quick check file, you need to rebuild the grammar

Note: This is slow, as quick-check is, off course, turned off. In general, you should use the mode you would normally use (e.g. with packing if you use packing).

LKB

(see Copestake (2002), indexed under checkpaths)

mv lkb/checkpaths.lsp pet/checkpaths.lsp.old

from within the *common-lisp* buffer:

(lkb::with-check-path-list-collection 
   "~/delphin/grammars/japanese/lkb/checkpaths" 
   (parse-sentences 
      "~/delphin/grammars/japanese/testsuites/hinoki-test-a.100" 
      "~/delphin/grammars/japanese/testsuites/hinoki-test-a.100.results"))

ToDo

  • This would be nice to automate
  • It would be nice to share the config between PET and the LKB (or convert)
  • It may be worth doing a grid search to optimize how many quick-check paths should actually be checkd.

Key Arguments

You can gain some performance increase by setting the order in which the daughters of rules are checked (Oepen & Carroll 2002: pp 204--206). The order can be specified in the grammar or in the configuation files for the lkb and pet.

* In the grammar

  • You can use KEY-ARG and specify it per rule in the grammar.
binary_rule_left_to_right := rule &
  [ ARGS < [ KEY-ARG + ] , [ KEY-ARG bool ] > ].

* In the LKB (lkb/globals.lsp)

(defparameter *rule-keys*
  '((HEAD-ADJUNCT-RULE1 . 1)
    (COMPOUNDS-RULE . 1)
    (KARA-MADE-RULE . 2) 
    (HEAD_SUBJ_RULE . 2)
    (HEAD-SPECIFIER-RULE . 2)
    (HEAD-COMPLEMENT-RULE . 2) 
    (HEAD-COMPLEMENT2-RULE . 2)
    (HEAD-ADJUNCT-RULE2 . 2)))

* In PET (pet/japanese.set)

;; assoc (rules -> keyarg position) (alternative to KEY-ARG mechanism)
rule-keyargs := 
$HEAD-ADJUNCT-RULE1 1 
$HEAD-ADJUNCT-RULE2 2 
$HEAD-ADJUNCT-RULE3 1 
$RELATIVE-CLAUSE-RULE 1 
$COMPOUNDS-RULE 1 
$SENTENCE-TE-COORDINATION-RULE 1
$CONJ-RULE 1
$KARA-MADE-RULE 2 
$HEAD_SUBJ_RULE 2 
$HEAD-SPECIFIER-RULE 2 
$HEAD-COMPLEMENT-HF-RULE 2 
$HEAD-COMPLEMENT-HI-RULE 1 
$HEAD-COMPLEMENT-AFFIXBIND-RULE 2
$HEAD-COMPLEMENT2-RULE 2 
$HEAD-2OBL-COMPLEMENTS-RULE 2
$VN-LIGHT-RULE 2
$VEND-VEND-RULE 1
$VSTEM-VEND-RULE 2 
$VN-VEND-RULE 2
$PREFIX-ATTACH-RULE 1
$NP-QUEST-FRAG-RULE 2.

Key mode in cheap is set with:

  `-key=n' --- select key mode (0=key-driven, 1=l-r, 2=r-l, 3=head-driven)

default is 0.

You get the data by creating two profiles one with -key=1 and one with key=2, turning on -rulestats. First enable [Process,switches:write rule relation] in itsdb. Use the mode you would normally use (e.g. with packing if you use packing).

Then [Analyze:rule table] for both profiles and you want to check the daughter with the least number of active edges (the passive edges should be the same modulo memory overflow errors).

ToDo

  • This would be nice to automate
  • It would be nice to share the config between PET and the LKB (or convert)

Spanning Only Rules

In PET only, you can set rules to only apply over the entire span.

spanning-only-rules := $frg-np $frg-pp $frg-s-adv $frg-i-adv
                       $frg-pp-np $frg-i-adv-np $frg-pp-int 
                       $runon_s.

Making the rules spanning only for pet reduces the number of tasks by 7.2%, and speeds things up by 5.1%. And we have only a few fragment rules at the moment, compared to the ERG's almost 20.

Ann suggests that this could also be done using special start/end tokens...

Ambiguity Packing

See PetSelectiveUnpacking.

Restrictions on the application of morphological rules

(Depreciated)

Trigger Rules for Generation

Current Issues

  • The Idiom optimizations don't seem to be working
  • It would be nice to use supertagging
Clone this wiki locally