Skip to content

Learning to Rank: Weights and Label difference normalization in pairwise full query ranking. #11424

Open
@jaguerrerod

Description

@jaguerrerod

There are two distinct categories of use cases in Learning to Rank (LTR):

1. Ranking Relevant Items Within a Query

This is the standard scenario in information retrieval, such as search engine result ranking or some recommendation systems. Its main characteristics include:

  • Use of relevance-based metrics focused on top-ranked items, such as MAP or NDCG.
  • Position bias correction mechanisms.
  • Truncation of candidate pairs based on the most relevant items (according to the labels or predictions).
  • Other types of normalizations specific to this context.

2. Full Ranking of a Dataset

Another important and often overlooked use case is the complete ranking of all elements in a dataset. This can be framed as LTR with a single query (or several queries representing different periods in time series datasets), and it is applicable to problems where the evaluation metric is, for instance, Spearman correlation, or even binary classification problems with AUC as the metric.

The nature of this use case makes many LTR implementations unsuitable (for example, LightGBM does not support it well).

XGBoost, however, does support LTR through the rank:pairwise objective. Still, there are some impactful aspects that could be improved:

Weights

In LTR, weights are always considered at the query level.
But what happens in pairwise use cases where there is only one query, or when multiple queries exist but we want to assign instance-level weights?

Since the weight parameter in the DMatrix constructor is the same, this behavior should be generalized. It should be possible to:

  • Provide weights of length equal to the number of queries (to be applied per group), or
  • Provide weights of length equal to the number of observations (to be applied per instance).

A consistent internal approach (aligned with other objectives) would be:

  • Always interpret weight as per-instance, and
  • If a per-query weight array is passed (with length equal to the number of queries), internally expand it into a vector matching the number of instances by repeating the group weight according to the group size.

Label Difference Normalization

In full-dataset ranking scenarios, labels are often quasi-continuous or have high granularity.
(It is up to the user to discretize or bin the labels if needed.)

Pairs with similar labels are generally less informative than those with very different labels. Therefore, introducing a normalization based on label difference is a natural and useful idea.

Assuming labels are preprocessed to lie within the [0, 1] percentile scale, the following logic (from XGBoost source code):

https://github.com/dmlc/xgboost/blob/4e24639d7de3d8e0aae0ae0ab061c14f704c0c35/src/objective/lambdarank_obj.h#L123C3-L125C4

Can be generalized as:

if (norm_by_diff && best_score != worst_score) {
  if (param_.IsMean()) {
    delta_metric *= std::pow(std::abs(y_high - y_low), label_diff_normalization);
  } else {
    delta_metric /= (delta_score + 0.01);
  }
}

Where label_diff_normalization is a user-defined parameter, with default value = 0.

Since y_high and y_low are percentiles, their absolute difference is bounded in [0, 1].

  • When label_diff_normalization == 0, delta_metric remains unchanged.
  • As label_diff_normalization increases, delta_metric decreases, effectively penalizing pairs with similar labels.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions