Skip to content

Apply a "collocation correction" to synt. coll. results #9

@tomachalek

Description

@tomachalek

For each result item, scollex should test whether the "child-parent" pair is not also a collocation. As in such case, the syntactical relationship may be incorrect. Because it may or may not be incorrect, Scollex should not remove such items or mark them with flags like "incorrect". We should just add a flag providing information, that the value is also a "traditional" collocate.

It would be probably best to have this functionality built directly into Scollex rather than moving the responsibility e.g. to WaG (imagine e.g. a tile loading data from Scollex and KonText (or MQuery) and combining them.

How it should work:

  1. the import function will have an option -colloc-flags-with-span (int value)
  2. if enabled, the vertical file processing will have two passes:
  3. find all "traditional" collocations and store them in memory
  4. run the current import to find syntactic collocations and for each word pair add a new attribute coOccurrence bool coOccurrenceScore float64 (we choose a co-occurrence instead of collocation to distinguish further between the collocations we are interested here - syntactic ones and the "traditional ones").

Implementation notes:

  1. to store freq info (Fxy, Fy, Fx) - use map (see FyTable, CounterTable for inspiration, maybe it will be even possible to reuse them)
  2. there will be no need to keep parentSumTable and childSumTable as the relationship in traditional colls is simpler (a word either is not is not in a defined span/window of the other word).
  3. the co-occurence will be defined for two words iff the "other" word is in a span ( -colloc-flags-with-span) of the "main" word (e.g. for span of 3 we will look 3 words backwards and 3 forwards)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions