Skip to content

Add -M/--find-renames option and blame.renames config to control rename detection #755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: vfs-2.49.0
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion Documentation/blame-options.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ include::line-range-format.adoc[]
or `--incremental`.

-M[<num>]::
--find-renames[=<num>]::
Detect moved or copied lines within a file. When a commit
Comment on lines 85 to 87
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you cannot do that. The existing -M option is not about whole-file rename detection, therefore we cannot reuse that, and my instructions regarding -M are moot.

You have to introduce a new --find-renames option instead.

moves or copies a block of lines (e.g. the original file
has A and then B, and the commit changes it to B and then
Expand All @@ -96,7 +97,19 @@ include::line-range-format.adoc[]
<num> is optional but it is the lower bound on the number of
alphanumeric characters that Git must detect as moving/copying
within a file for it to associate those lines with the parent
commit. The default value is 20.
commit. If <num> is specified, it also affects the automatic
detection of whole-file renames. The value can be from 0 to 100
and represents a similarity index. A value of 0 disables rename
detection entirely, 100 requires exact matches, and values in
between control how similar the file content needs to be to be
considered a rename. The default value is 50.
+
The `-M` option can also be used to influence rename detection
behavior when following the origin of lines across repository
history. By default, rename detection is enabled at a 50%
similarity threshold, which can lead to performance issues in
large repositories. This option (or the `blame.renames` config)
can be used to disable or adjust the rename detection.

-C[<num>]::
In addition to `-M`, detect lines moved or copied from other
Expand Down
11 changes: 11 additions & 0 deletions Documentation/config/blame.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,14 @@ blame.markUnblamableLines::
blame.markIgnoredLines::
Mark lines that were changed by an ignored revision that we attributed to
another commit with a '?' in the output of linkgit:git-blame[1].

blame.renames::
Controls rename detection when following the history of lines in
linkgit:git-blame[1]. It can be set to `true` (default), `false`,
`copy`, or an integer value specifying the minimum similarity index
(from 0 to 100). When set to `false`, no rename detection is performed.
When set to `true`, it behaves the same as the default similarity index
of 50%. When set to `copy`, both rename and copy detection is performed.
An integer value specifies the minimum similarity index, with 0 meaning
"no rename detection" and 100 meaning "only exact renames". The `-M`
option overrides this setting.
11 changes: 6 additions & 5 deletions Documentation/git-blame.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ SYNOPSIS
--------
[verse]
'git blame' [-c] [-b] [-l] [--root] [-t] [-f] [-n] [-s] [-e] [-p] [-w] [--incremental]
[-L <range>] [-S <revs-file>] [-M] [-C] [-C] [-C] [--since=<date>]
[-L <range>] [-S <revs-file>] [-M[<n>]] [--find-renames[=<n>]] [-C] [-C] [-C] [--since=<date>]
[--ignore-rev <rev>] [--ignore-revs-file <file>]
[--color-lines] [--color-by-age] [--progress] [--abbrev=<n>]
[ --contents <file> ] [<rev> | --reverse <rev>..<rev>] [--] <file>
Expand All @@ -24,10 +24,11 @@ When specified one or more times, `-L` restricts annotation to the requested
lines.

The origin of lines is automatically followed across whole-file
renames (currently there is no option to turn the rename-following
off). To follow lines moved from one file to another, or to follow
lines that were copied and pasted from another file, etc., see the
`-C` and `-M` options.
renames. By default, git blame follows both exact renames (100% match)
and inexact renames (partially matching content). Use the `-M` option
to control this behavior. To follow lines moved from one file to another,
or to follow lines that were copied and pasted from another file, etc.,
see the `-C` and `-M` options.

The report does not tell you anything about lines which have been deleted or
replaced; you need to use a tool such as 'git diff' or the "pickaxe"
Expand Down
9 changes: 8 additions & 1 deletion blame.c
Original file line number Diff line number Diff line change
Expand Up @@ -1423,10 +1423,17 @@ static struct blame_origin *find_rename(struct repository *r,
struct blame_origin *porigin = NULL;
struct diff_options diff_opts;
int i;
extern int rename_detection_mode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this won't work. You declared it as static in builtin/blame.c, which makes it file-local, and you cannot ever see it from this here file.

Instead, you have to introduce a new attribute in blame.h, probably in struct blame_scoreboard next to the xdl_opts (which are also diff-related).


repo_diff_setup(r, &diff_opts);
diff_opts.flags.recursive = 1;
diff_opts.detect_rename = DIFF_DETECT_RENAME;
/*
* Use rename_detection_mode if specified, otherwise default to DIFF_DETECT_RENAME
* For mode values > 0 and < 100, use it as similarity threshold
*/
diff_opts.detect_rename = (rename_detection_mode == 0) ? 0 :
(rename_detection_mode > 0) ?
rename_detection_mode : DIFF_DETECT_RENAME;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where you actually use rename_detection_mode, which you described elsewhere as a percentage. However, it is used exclusively as a Boolean. You have to study diff*.c harder to see how the --find-renames value is used there and imitate it here.

diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
diff_opts.single_follow = origin->path;
diff_setup_done(&diff_opts);
Expand Down
66 changes: 52 additions & 14 deletions builtin/blame.c
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ static struct string_list mailmap = STRING_LIST_INIT_NODUP;

static unsigned blame_move_score;
static unsigned blame_copy_score;
static int rename_detection_mode = -1; /* -1: default, 0: disabled, >0: enabled with score */

/* Remember to update object flag allocation in object.h */
#define METAINFO_SHOWN (1u<<12)
Expand Down Expand Up @@ -743,6 +744,26 @@ static int git_blame_config(const char *var, const char *value,
mark_ignored_lines = git_config_bool(var, value);
return 0;
}
if (!strcmp(var, "blame.renames")) {
if (!value)
return config_error_nonbool(var);
if (!strcmp(value, "true") || !strcmp(value, "1")) {
rename_detection_mode = DIFF_DETECT_RENAME;
} else if (!strcmp(value, "false") || !strcmp(value, "0")) {
rename_detection_mode = 0;
} else if (!strcmp(value, "copy")) {
rename_detection_mode = DIFF_DETECT_COPY;
} else {
int score = git_config_int(var, value, NULL);
if (score < 0 || score > 100)
return error(_("invalid value for %s"), var);
if (score == 100)
rename_detection_mode = 100; /* exact rename only */
else
rename_detection_mode = score;
}
return 0;
}
if (!strcmp(var, "color.blame.repeatedlines")) {
if (color_parse_mem(value, strlen(value), repeated_meta_color))
warning(_("invalid value for '%s': '%s'"),
Expand Down Expand Up @@ -779,6 +800,36 @@ static int git_blame_config(const char *var, const char *value,
return git_default_config(var, value, ctx, cb);
}

static int find_rename_callback(const struct option *option, const char *arg, int unset)
{
int *rename_detection = option->value;

BUG_ON_OPT_NEG(unset);

/* --find-renames without a score */
*rename_detection = DIFF_DETECT_RENAME;

if (arg) {
int value;
const char *percent;

/* Handle -M<n> or --find-renames=<n> */
value = strtol(arg, (char **) &percent, 10);
if (percent == arg)
return error(_("invalid similarity threshold '%s'"), arg);
if (value < 0 || 100 < value)
return error(_("similarity threshold must be between 0 and 100"));
/* A threshold of 0 is equivalent to no rename detection */
if (value == 0)
*rename_detection = 0;
else if (value == 100)
*rename_detection = 100; /* exact rename only */
else
*rename_detection = value;
}
return 0;
}

static int blame_copy_callback(const struct option *option, const char *arg, int unset)
{
int *opt = option->value;
Expand All @@ -803,19 +854,6 @@ static int blame_copy_callback(const struct option *option, const char *arg, int
return 0;
}

static int blame_move_callback(const struct option *option, const char *arg, int unset)
{
int *opt = option->value;

BUG_ON_OPT_NEG(unset);

*opt |= PICKAXE_BLAME_MOVE;

if (arg)
blame_move_score = parse_score(arg);
return 0;
}

static int is_a_rev(const char *name)
{
struct object_id oid;
Expand Down Expand Up @@ -915,7 +953,7 @@ int cmd_blame(int argc,
OPT_STRING('S', NULL, &revs_file, N_("file"), N_("use revisions from <file> instead of calling git-rev-list")),
OPT_STRING(0, "contents", &contents_from, N_("file"), N_("use <file>'s contents as the final image")),
OPT_CALLBACK_F('C', NULL, &opt, N_("score"), N_("find line copies within and across files"), PARSE_OPT_OPTARG, blame_copy_callback),
OPT_CALLBACK_F('M', NULL, &opt, N_("score"), N_("find line movements within and across files"), PARSE_OPT_OPTARG, blame_move_callback),
OPT_CALLBACK_F('M', "find-renames", &rename_detection_mode, N_("score"), N_("find renames, optionally set similarity index"), PARSE_OPT_OPTARG, find_rename_callback),
OPT_STRING_LIST('L', NULL, &range_list, N_("range"),
N_("process only line range <start>,<end> or function :<funcname>")),
OPT__ABBREV(&abbrev),
Expand Down
82 changes: 82 additions & 0 deletions t/t8015-blame-rename-detection.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#!/bin/sh

test_description='git blame rename detection control'

. ./test-lib.sh

test_expect_success 'setup test file rename with content changes' '
git init &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do study other t/t-[0-9]*-*blame*.sh scripts. git init is not necessary.

echo abc >1.txt &&
echo def >>1.txt &&
echo ghi >>1.txt &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use test_write_lines instead.

git add . &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. is sloppy. Specify 1.txt explicitly.

git commit -m "Initial commit" &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to prefix this with test_tick to make the OIDs reproducible.


git mv 1.txt 2.txt &&
echo abc >2.txt &&
echo 123 >>2.txt &&
echo ghi >>2.txt &&
git add . &&
git commit -m "Rename+edit together"
'

# This test confirms that by default, git blame follows partial-file renames
test_expect_success 'git blame follows inexact renames by default' '
FIXED_1=$(git rev-parse --short HEAD^) &&
FIXED_2=$(git rev-parse --short HEAD) &&

git blame 2.txt >output &&
grep "$FIXED_1" output | grep -q abc &&
grep "$FIXED_2" output | grep -q 123 &&
grep "$FIXED_1" output | grep -q ghi
'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too focused, and unnecessary. We really only need to verify that 1.txt is mentioned only when the rename-detection score is high enough.


# This test confirms that --no-find-renames or -M0 turns off rename detection
test_expect_success 'git blame can disable rename detection' '
git blame --no-find-renames 2.txt >output &&
! grep -q 1.txt output
'

# This test confirms that -M100 only follows exact renames
test_expect_success 'git blame can restrict to exact renames' '
git blame -M100 2.txt >output &&
! grep -q 1.txt output
'

# This test checks that blame.renames config works
test_expect_success 'blame.renames=false disables rename detection' '
git -c blame.renames=false blame 2.txt >output &&
! grep -q 1.txt output
'

# This test checks that -M with a score works
test_expect_success 'git blame with similarity score follows renames above threshold' '
# Must follow 1.txt->2.txt rename for abc which are identical
git blame -M70 2.txt >output &&
grep "$FIXED_1" output | grep -q abc &&
# Should not follow for others below threshold
grep "$FIXED_2" output | grep -q 123 &&
grep "$FIXED_2" output | grep -q ghi
'

# This test checks that -M overrides blame.renames
test_expect_success '-M overrides blame.renames config' '
# Using blame.renames=false but -M60
git -c blame.renames=false blame -M60 2.txt >output &&
grep "$FIXED_1" output | grep -q abc &&
# The rest would be below 60% threshold
grep "$FIXED_2" output | grep -q 123 &&
grep "$FIXED_2" output | grep -q ghi
'

# This test checks that blame.renames with a score works
test_expect_success 'blame.renames with score controls rename threshold' '
# Set threshold at 70%, abc is identical so above threshold
git -c blame.renames=70 blame 2.txt >output &&
grep "$FIXED_1" output | grep -q abc &&
# Other lines below threshold
grep "$FIXED_2" output | grep -q 123 &&
grep "$FIXED_2" output | grep -q ghi
'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are waaaaay too many test cases for the job. All you need to do is to run git blame with various ways to specify the rename score, and then verify that 1.txt is either mentioned or not, depending on that score.

In fact, I highly suspect that this single test case that you need to add for this entire PR would find a much nicer home in one of the existing t/t*-blame*.sh scripts.


test_done
Loading