-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
As discussed on the event, our current scoring algorithm has a few easily exploitable weaknesses. This includes the creation of shadow websites that increase the score of another site by linking to it multiple times and artificially increasing a score for a network, e.g., site A is referenced by Site B (linked indirectly by some site of the root set from network 1) and references Site C, D and E, which in turn link all back to A multiple times to increase Site A's score.
Further, you may obfuscate outward pointing links to never give up received points when being evaluated, e.g., a site using a onclick
handler that triggers a call to window.location.replace
instead of a proper a
link (Note: This might also be only done for our scrapers and done properly for other scrapers etc.).
Proposal
Change from direct trust system, i.e., number of links = trust of site, to limited-transitive trust system. The limited-transitive system is based on the same dataset but instead of evaluating a score directly we will build the graph by distributing points from the root set that will be distributed to all linked domains and can then be further proportionally distributed from them until a certain depth is reached (the limited
part). The transitive part is provided by the way trust is distributed over the nodes, where when site a
trust b
and b
trusts c
leads to a
giving trust to c
.
Note: Proportionally distributed means that the curren trust score is given to each referenced site proportionally to the amount of times they were referenced, e.g., site
a
with a score of 4 has 6 outward references and siteb
is referenced 2 times by sitea
→ siteb
has a score of1.34
.
Formal definition
Both networks are handled the same way, and they only differ in the content of the root set of sites.
Let d
be the fixed depth of the trust network, where d \in N
Given a finite root set of sites: R = { example.org, … }
and the finite set of all sites: S = { example.de, … }
, with R < S
.
Build a relation from this set of sites by references to other sites with depth d
: L_r = R x S x … x S
.
We can now assign a score of 10 to all domains of the root set. With those base points we can evaluate the trust from one node to another for the next level by deriving from the score for the referring node with the formula f(r, s) = S_r * (L_r * l_s)
, where r
= referrer node, s
= referred node (who's score is being evaluated), L_r
= total number of links from r
to other nodes and l_s
= number of links from r
→ s
. This will be done for all nodes within the relation L_r
recursively, with the evaluated sores for all references to a node being summed up as a total score for a node on the network.
I'm aware of the high cost of evaluation though I think it's doable given a good implentationwith pre-evaluation of scores and a reasonable depth.