-
Notifications
You must be signed in to change notification settings - Fork 2
context detection and personalization
Start with a black-/whitelist, which can be modified by the user (traffic lights in the bar at the bottom of the page)
Add a page classifier to determine whether a page is relevant or not beyond the black/whitelist examples. Possible classification approaches based on:
- text of a sampled paragraph from the page
- character bigrams of the URL
- title of the page?
Learn by user interaction (black-/white-listing the page via the traffic lights or manual search triggers on that page)
Heuristic based on length of DOM-nodes
Determine the focused paragraph based on the current viewport, layout, scroll- and mouse-position. Only the paragraphs currently visible in the viewport are valid candidates with the other features as further indicators. For example the paragraph on the top left is more likely to be viewed, as well as a paragraph at the current mouse position. On the other hand, when a user has scrolled to the bottom of a page, the last paragraph may be more likely in the focus.
For the query generation, there exist different strategies, which may be combined. The most favorable strategy until now seems to be using named entities as query terms, while the other strategies are considered as fallback, in case the server is not able to handle the load.
Obtain named entities via Stefan's Service and construct a query from them via learn to rank (LTR). Features may be:
- term frequency (# of occurrences)
- confidence provided by service (not provided until now)
- class of entity (person, location, ...)
- TF-IDF measure of the entity's label collected from browsing history
- exact match of entity's label in text
- length
Extract Noun Phrases via NounPhraseJS, ranking also via LTR with reduced feature set
Top-K keywords obtained via TF—IDF over browsing history (or other measure, e.g. TextRank, depending on the evaluation of relevantico experiment)
NounPhraseJS might also be used to extract dates, no clear plan for this until now.
In order to reduce the load on the federated recommender, a query might not be triggered for each paragraph right from the start. Instead, a user could manually trigger a query or a query could be triggered if the paragraph is deemed interesting for the user (Wiki-Edit experiment).
In this approach, a single paragraph provides the context in the current page. Further features of the whole page might be integrated. Personalisation comes into play when determining the relevance of the current page, an interesting paragraph and ranking the query terms.