WeSearch_LexicalFiltering

Background

Working with a lattice of lexical hypotheses and an (über)tagger, we seek to develop a filtering function that discards unlikely hypotheses. The formalisation of the lexical filtering process may be found [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/formalisation.pdf here].

TNT output for filtering of LE types

One such filter function maps PTB tags output from the TNT tagger onto LE Types. Mappings may be derived intuitively from inspection of a [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/tnt.le.confusion.pdf confusion matrix] detailing the choices of TNT with respect to LE types.

An alternative approach is to programmatically find mappings based on the preferred outcomes of lexical filtering, (i.e. gains in parser efficiency versus losses in parser accuracy and coverage). These outcomes may be approximated by examining the relations between TNT precision, TNT recall and the lexical ambiguity of LE types.

Frequency of LE types in JH0 parse forest:


type	frequency
n	1,134,661
p	498,443
v	454,513
d	335,667
aj	332,243
c	182,759
av	145,290
cm	33,618
pp	34,496
pt	3,864

ROC plots of the TNT performance on the most frequent LE types

[http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/roc.png]

A plot of the precision vs. lexical items filtered for each handled LE type:

[http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/filtering.png]

Effective threshold ranges for LE Types:


type	min	min-precision	min-filtering	max	max-precision	max-filtering
n	0.36	0.93737	0.36853	1.00	0.97246	0.29400
v	0.15	0.96925	0.09541	1.00	0.98677	0.06477
p	0.36	0.92586	0.25249	1.00	0.96771	0.18468
d	0.38	0.95638	0.50612	1.00	0.96796	0.46700
aj	0.18	0.74427	0.51437	1.00	0.85327	0.40463
av	0.28	0.83146	0.37179	1.00	0.91001	0.23647

Related Work

Driden, R. (2009), [http://www.dridan.com/research/papers/dridan-phdthesis.pdf Using Lexical Statistics to Improve HPSG Parsing], PhD Thesis, Saarland University
Ninomiya, T., Matsuzaki, T., Miyao, Y., Tsuruoka, Y and Tsujii, J. (2010). HPSG Parsing with a Supertagger. In Bunt, H., Merlo, P., and Nivre, J. (Eds.): [http://www.springerlink.com/content/978-90-481-9351-6 Trends in Parsing Technology: Dependency Parsing, Domain Adaptation, and Deep Parsing]. Springer, pp. 243-256.

Home | Forum | Discussions | Events

WeSearch_LexicalFiltering

Background

TNT output for filtering of LE types

Related Work

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!