Skip to content

Metric for EA score is inaccurately calculated, #2

@ShqWW

Description

@ShqWW

The metric for the EA-score is erroneous,

the original code within calculate_metrics.py is

def caculate_precision(b_points, gt_coords, thresh=0.90):
    N = len(b_points)
    if N == 0:
        return 0, 0
    ea = np.zeros(N, dtype=np.float32)
    for i, coord_p in enumerate(b_points):
        if coord_p[0]==coord_p[2] and coord_p[1]==coord_p[3]:
            continue
        l_pred = Line(list(coord_p))
        for coord_g in gt_coords:
            l_gt = Line(list(coord_g))
            ea[i] = max(ea[i], EA_metric(l_pred, l_gt))
    return (ea >= thresh).sum(), N

def caculate_recall(b_points, gt_coords, thresh=0.90):
    N = len(gt_coords)
    if N == 0:
        return 1.0, 0
    ea = np.zeros(N, dtype=np.float32)
    for i, coord_g in enumerate(gt_coords):
        l_gt = Line(list(coord_g))
        for coord_p in b_points:
            if coord_p[0]==coord_p[2] and coord_p[1]==coord_p[3]:
                continue
            l_pred = Line(list(coord_p))
            ea[i] = max(ea[i], EA_metric(l_pred, l_gt))
    return (ea >= thresh).sum(), N

while I use new code to calculate the result

def calulate_precison_recall(b_points, gt_coords, thers=0.90):
    num_pred, num_gt = len(b_points), len(gt_coords)
    ea = np.zeros((num_gt, num_pred), dtype=np.float32)
    for i, coord_g in enumerate(gt_coords):
        l_gt = Line(list(coord_g))
        for j, coord_p in enumerate(b_points):
            if coord_p[0]==coord_p[2] and coord_p[1]==coord_p[3]:
                continue
            l_pred = Line(list(coord_p))
            ea[i, j] = EA_metric(l_pred, l_gt)
    G = (ea>thers).astype(np.int32)
    row_ind, col_ind = linear_sum_assignment(-G)

    pair_nums = G[row_ind, col_ind].sum()

    tp = pair_nums
    fp = num_pred - pair_nums
    fn = num_gt - pair_nums
    return tp, fp, fn

the prediction and the groud truth should be matched initially. The origin code omits the crucial step of matching, so one ground truth may be assigned to more than one prediction, while one prediction may also assignment to more than one ground truth, which is unreasonable and causes repeats.

here is the result on CDL with the new code:

CDL 0710 ==> Total HIoU 0.689615 / P 0.700896 / R 0.707499 / F 0.704181

which is different from that of origin code. If you commit the difference, could you provide new results for other baseline methods you produced in you paper?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions