Skip to content

WeSearch_StarSem_MrsCrawlingEvaluation

WoodleyPackard edited this page Aug 24, 2013 · 11 revisions

Results on SST, as reported by the official shared task evaluation script.

Exact Scopes Token in/out-scope
Method TP FP FN P R F1 TP FP FN P R F1
Baseline 289 6 598 97.97 32.58 48.90 6438 3411 491 65.37 92.91 76.74
C&J rules 636 0 251 100.0 71.70 83.52 6514 1207 415 84.37 94.01 88.93
C&J ranker 661 0 226 100.0 74.52 85.40 6512 983 417 86.88 93.98 90.29
Aug-12-2013
MRS crawling 350 2 537 99.43 39.46 56.50 4399 673 2530 86.73 63.49 73.31
+ baseline 420 8 467 98.13 47.35 63.88 6014 1735 915 77.61 86.79 81.94
+ C&J rules 503 2 384 99.60 56.71 72.27 5997 1061 932 84.97 86.55 85.75
+ C&J ranker 516 2 371 99.61 58.17 73.45 6033 1022 896 85.51 87.07 86.28
Oracle 698 0 189 100.0 78.69 88.08
Aug-23-2013 [1212/pet]
MRS crawling 411 2 476 99.52 46.34 63.24 4576 668 2353 87.26 66.04 75.18
+ C&J ranker 554 2 333 99.64 62.46 76.79 6049 953 880 86.39 87.30 86.84
Aug-23-2013 [1212/ace]
MRS crawling 418 4 469 99.05 47.13 63.87 4760 749 2169 86.40 68.70 76.54
+ C&J ranker 545 4 342 99.27 61.44 75.90 6063 1012 866 85.70 87.50 86.59

Notes:

  • Errors in exact scope count just once, as a false negative. During the shared task there was some debate as to whether an error should generate both an FP and an FN, but this seemed to be the organisers' preference.
  • Subsequent results for MRS crawling indicate the results of using the predictions of the specified method for cases where no prediction is made by the MRS crawling rules.
  • The 1212/ace profile differs from the 1212/pet profile in the processor used for parsing. The most important intentional difference is that the resource limit under ACE is relatively vast (only six sentences hit resource limits), so grammar coverage was slightly higher.
Clone this wiki locally