Skip to content

WeSearch_StarSem_MrsCrawlingEvaluation

JonathonRead edited this page Aug 17, 2013 · 11 revisions

Results on SST, as reported by the official shared task evaluation script.

Exact Scopes Token in/out-scope
Method TP FP FN P R F1 TP FP FN P R F1
Baseline 289 6 598 97.97 32.58 48.90 6438 3411 491 65.37 92.91 76.74
C&J rules 636 0 251 100.0 71.70 83.52 6514 1207 415 84.37 94.01 88.93
C&J ranker 661 0 226 100.0 74.52 85.40 6512 983 417 86.88 93.98 90.29
MRS crawling 350 2 537 99.43 39.46 56.50 4399 673 2530 86.73 63.49 73.31
+ baseline 420 8 467 98.13 47.35 63.88 6014 1735 915 77.61 86.79 81.94
+ C&J rules 503 2 384 99.60 56.71 72.27 5997 1061 932 84.97 86.55 85.75
+ C&J ranker 516 2 371 99.61 58.17 73.45 6033 1022 896 85.51 87.07 86.28
Oracle 698 0 189 100.0 78.69 88.08

Notes:

  • Errors in exact scope count just once, as a false negative. During the shared task there was some debate as to whether an error should generate both an FP and an FN, but this seemed to be the organisers' preference.
  • Subsequent results for MRS crawling indicate the results of using the predictions of the specified method for cases where no prediction is made by the MRS crawling rules.
Clone this wiki locally