Skip to content

Proposal: Change scoring algorithm #2

@Chaostheorie

Description

@Chaostheorie

Problem

As discussed on the event, our current scoring algorithm has a few easily exploitable weaknesses. This includes the creation of shadow websites that increase the score of another site by linking to it multiple times and artificially increasing a score for a network, e.g., site A is referenced by Site B (linked indirectly by some site of the root set from network 1) and references Site C, D and E, which in turn link all back to A multiple times to increase Site A's score.

Further, you may obfuscate outward pointing links to never give up received points when being evaluated, e.g., a site using a onclick handler that triggers a call to window.location.replace instead of a proper a link (Note: This might also be only done for our scrapers and done properly for other scrapers etc.).

Proposal

Change from direct trust system, i.e., number of links = trust of site, to limited-transitive trust system. The limited-transitive system is based on the same dataset but instead of evaluating a score directly we will build the graph by distributing points from the root set that will be distributed to all linked domains and can then be further proportionally distributed from them until a certain depth is reached (the limited part). The transitive part is provided by the way trust is distributed over the nodes, where when site a trust b and b trusts c leads to a giving trust to c.

Note: Proportionally distributed means that the curren trust score is given to each referenced site proportionally to the amount of times they were referenced, e.g., site a with a score of 4 has 6 outward references and site b is referenced 2 times by site a → site b has a score of 1.34.

Formal definition

Both networks are handled the same way, and they only differ in the content of the root set of sites.

Let d be the fixed depth of the trust network, where d \in N

Given a finite root set of sites: R = { example.org, … } and the finite set of all sites: S = { example.de, … }, with R < S.

Build a relation from this set of sites by references to other sites with depth d: L_r = R x S x … x S.

We can now assign a score of 10 to all domains of the root set. With those base points we can evaluate the trust from one node to another for the next level by deriving from the score for the referring node with the formula f(r, s) = S_r * (L_r * l_s), where r = referrer node, s = referred node (who's score is being evaluated), L_r = total number of links from r to other nodes and l_s = number of links from rs. This will be done for all nodes within the relation L_r recursively, with the evaluated sores for all references to a node being summed up as a total score for a node on the network.

I'm aware of the high cost of evaluation though I think it's doable given a good implentationwith pre-evaluation of scores and a reasonable depth.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions