-
Notifications
You must be signed in to change notification settings - Fork 4
LkbPreprocessing
As part of the EU-funded [http://www.project-deepthought.net Deep-Thought] project, the LKB was interfaced to an external finite-state preprocessor, morphological analyzer, and tagger using an XML-based interface. This interface could serve to harmonize existing connections to external preprocessors (e.g. to [http://chasen.aist-nara.ac.jp ChaSen] in the Japanese grammar) over time, and probably should be used as the role model for future integrations.
The Simple PreProcessor Protocol is documented on the LkbSppp page.
The MAF-based XML annotation scheme is documented on the LkbMaf page.
To use the built-in finite-state preprocessor please a line such as (one of) the following in your grammar's script file:
(read-preprocessor (lkb-pathname (parent-directory) "preprocessor.fsr"))
(x-read-preprocessor (lkb-pathname (parent-directory) "preprocessor.fsr"))
(Note: these function calls are equivalent, except that characterization support (see below) will only be available if you have called: x-read-preprocessor.)
Once a preprocessor has been loaded, preprocess-sentence-string will automatically send input to the correct finite-state preprocessor (be this an x-preprocessor/preprocessor or an SPPP preprocessor). If no preprocessor has been loaded, you get the LKB's default built-in preprocessor (this throws away punctuation - modulo *bracketing-p* - and creates a sequence of tokens by splitting on space characters).
Characterization support allows the deep processor to relate components of the linguistic analysis to the input text. To enable simple (character-point based) characterization support, set the following in your grammar's globals.lsp:
(setf *characterize-p* t)
You must also ensure that CFROM/CTO features (of type lkb::*toptype*) are provided in the (R)MRS semantics component of your grammar. Eg. [from the ERG]
relation := relation_min &
[ PRED predsort,
LBL handle,
WLINK *list*,
CFROM *top*,
CTO *top* ].
These features are used to store and propogate pointers during linguistic processing. You can see these features in the semantic analysis:
The dog barks.
<rmrs cfrom='-1' cto='-1'>
<label vid='1'/>
<ep cfrom='0' cto='14'><gpred>prpstn_m_rel</gpred><label vid='1'/><var sort='e' vid='2' tense='present'/></ep>
<ep cfrom='0' cto='3'><realpred lemma='the' pos='q'/><label vid='6'/><var sort='x' vid='9' pers='3' num='sg'/></ep>
<ep cfrom='4' cto='7'><realpred lemma='dog' pos='n' sense='1'/><label vid='10'/><var sort='x' vid='9' pers='3' num='sg'/></ep>
<ep cfrom='8' cto='14'><realpred lemma='bark' pos='v' sense='1'/><label vid='11'/><var sort='e' vid='2' tense='present'/></ep>
<rarg><rargname>MARG</rargname><label vid='1'/><var sort='h' vid='4'/></rarg>
<rarg><rargname>RSTR</rargname><label vid='6'/><var sort='h' vid='8'/></rarg>
<rarg><rargname>BODY</rargname><label vid='6'/><var sort='h' vid='7'/></rarg>
<rarg><rargname>ARG1</rargname><label vid='11'/><var sort='x' vid='9' pers='3' num='sg'/></rarg>
<hcons hreln='qeq'><hi><var sort='h' vid='4'/></hi><lo><label vid='11'/></lo></hcons>
<hcons hreln='qeq'><hi><var sort='h' vid='8'/></hi><lo><label vid='10'/></lo></hcons>
</rmrs>
For more sophisticated characterization support see LkbMaf.
Home | Forum | Discussions | Events