-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
For each result item, scollex should test whether the "child-parent" pair is not also a collocation. As in such case, the syntactical relationship may be incorrect. Because it may or may not be incorrect, Scollex should not remove such items or mark them with flags like "incorrect". We should just add a flag providing information, that the value is also a "traditional" collocate.
It would be probably best to have this functionality built directly into Scollex rather than moving the responsibility e.g. to WaG (imagine e.g. a tile loading data from Scollex and KonText (or MQuery) and combining them.
How it should work:
- the
importfunction will have an option-colloc-flags-with-span(int value) - if enabled, the vertical file processing will have two passes:
- find all "traditional" collocations and store them in memory
- run the current
importto find syntactic collocations and for each word pair add a new attributecoOccurrence boolcoOccurrenceScore float64(we choose a co-occurrence instead of collocation to distinguish further between the collocations we are interested here - syntactic ones and the "traditional ones").
Implementation notes:
- to store freq info (Fxy, Fy, Fx) - use
map(seeFyTable,CounterTablefor inspiration, maybe it will be even possible to reuse them) - there will be no need to keep
parentSumTableandchildSumTableas the relationship in traditional colls is simpler (a word either is not is not in a defined span/window of the other word). - the co-occurence will be defined for two words iff the "other" word is in a span (
-colloc-flags-with-span) of the "main" word (e.g. for span of 3 we will look 3 words backwards and 3 forwards)
Metadata
Metadata
Assignees
Labels
No labels