Skip to content

context detection and personalization

Jörg Schlötterer edited this page Aug 13, 2015 · 2 revisions

Overview

Deciding wether a page is relevant

Start with a black-/whitelist, which can be modified by the user (traffic lights in the bar at the bottom of the page)

Add a page classifier to determine whether a page is relevant or not beyond the black/whitelist examples. Possible classification approaches based on:

  • text of a sampled paragraph from the page
  • character bigrams of the URL
  • title of the page?

Learn by user interaction (black-/white-listing the page via the traffic lights or manual search triggers on that page)

Paragraph Extraction

Heuristic based on length of DOM-nodes

Focused Paragraph

Determine the focused paragraph based on the current viewport, layout, scroll- and mouse-position. Only the paragraphs currently visible in the viewport are valid candidates with the other features as further indicators. For example the paragraph on the top left is more likely to be viewed, as well as a paragraph at the current mouse position. On the other hand, when a user has scrolled to the bottom of a page, the last paragraph may be more likely in the focus.

Generate Query

For the query generation, there exist different strategies, which may be combined. The most favorable strategy until now seems to be using named entities as query terms, while the other strategies are considered as fallback, in case the server is not able to handle the load.

Named Entities

Obtain named entities via Stefan's Service and construct a query from them via learn to rank (LTR). Features may be:

  • term frequency (# of occurrences)
  • confidence provided by service (not provided until now)
  • class of entity (person, location, ...)
  • TF-IDF measure of the entity's label collected from browsing history
  • exact match of entity's label in text
  • length

Noun Phrases

Extract Noun Phrases via NounPhraseJS, ranking also via LTR with reduced feature set

Keywords

Top-K keywords obtained via TF—IDF over browsing history (or other measure, e.g. TextRank, depending on the evaluation of relevantico experiment)

Temporal features?

NounPhraseJS might also be used to extract dates, no clear plan for this until now.

Query Trigger

In order to reduce the load on the federated recommender, a query might not be triggered for each paragraph right from the start. Instead, a user could manually trigger a query or a query could be triggered if the paragraph is deemed interesting for the user (Wiki-Edit experiment).

Contextualisation/Personalisation

In this approach, a single paragraph provides the context in the current page. Further features of the whole page might be integrated. Personalisation comes into play when determining the relevance of the current page, an interesting paragraph and ranking the query terms.