Skip to content

Assumptions regarding matches and match length #2395

Open
@tsaglam

Description

@tsaglam

JPlag always had the assumption that a match (meaning two matching subsequences of tokens between two programs) has the same length on both sides. Note that this does not mean the code fragments that were matched have the same length, but the number of tokens that represent these code fragments is the same.

In the future, we want to move away from this assumption. Especially when using defence mechanisms like subsequence match merging, matches can have different lengths. We circumvented this issue by removing tokens in the past. In the future, we want to change how we represent matches by introducing two lengths for each side of a match (see #2268).

This has almost no effect on users. The main difference is a change in the API:

/**
* Represents two code fragments in two submissions that are structurally similar. These sections are identical token
* subsequences.
* @param startOfFirst is the index of the first token of the match in the first submission.
* @param startOfSecond is the index of the first token of the match in the second submission.
* @param length is the length of these similar sections (number of tokens).
*/
public record Match(int startOfFirst, int startOfSecond, int length) {

This change would introduce two lengths per match.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementIssue/PR that involves features, improvements and other changesmajorMajor issue/feature/contribution/changereport-viewerPR / Issue deals (partly) with the report viewer and thus involves web-dev technologies

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions