Proposal:  Change scoring algorithm

## Problem

As discussed on the event, our current scoring algorithm has a few easily exploitable weaknesses. This includes the creation of shadow websites that increase the score of another site by linking to it multiple times and artificially increasing a score for a network, e.g., site A is referenced by Site B (linked indirectly by some site of the root set from network 1) and references Site C, D and E, which in turn link all back to A multiple times to increase Site A's score.

Further, you may obfuscate outward pointing links to never give up received points when being evaluated, e.g., a site using a `onclick` handler that triggers a call to `window.location.replace` instead of a proper `a` link (Note: This might also be only done for our scrapers and done properly for other scrapers etc.).

## Proposal

Change from direct trust system, i.e., number of links = trust of site, to limited-transitive trust system. The limited-transitive system is based on the same dataset but instead of evaluating a score directly we will build the graph by distributing points from the root set that will be distributed to all linked domains and can then be further proportionally distributed from them until a certain depth is reached (the `limited` part). The transitive part is provided by the way trust is distributed over the nodes, where when site `a` trust `b` and `b` trusts `c` leads to `a` giving trust to `c`. 

> Note: Proportionally distributed means that the curren trust score is given to each referenced site proportionally to the amount of times they were referenced, e.g., site `a` with a score of 4 has 6 outward references and site `b` is referenced 2 times by site `a` → site `b` has a score of `1.34`.

## Formal definition

> Both networks are handled the same way, and they only differ in the content of the root set of sites.

Let `d` be the fixed depth of the trust network, where `d \in N`

Given a finite root set of sites: `R = { example.org, … }` and the finite set of all sites: `S = { example.de, … }`, with `R < S`.


Build a relation from this set of sites by references to other sites with depth `d`: `L_r = R x S x … x S`. 

We can now assign a score of 10 to all domains of the root set. With those base points we can evaluate the trust from one node to another for the next level by deriving from the score for the referring node with the formula `f(r, s) = S_r * (L_r * l_s)`, where `r` = referrer node, `s` = referred node (who's score is being evaluated), `L_r` = total number of links from `r` to other nodes and `l_s` = number of links from `r` → `s`. This will be done for all nodes within the relation `L_r` recursively, with the evaluated sores for all references to a node being summed up as a total score for a node on the network.

> I'm aware of the high cost of evaluation though I think it's doable given a good implentationwith pre-evaluation of scores and a reasonable depth.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Change scoring algorithm #2

Problem

Proposal

Formal definition

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Change scoring algorithm #2

Description

Problem

Proposal

Formal definition

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions