You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two distinct categories of use cases in Learning to Rank (LTR):
1. Ranking Relevant Items Within a Query
This is the standard scenario in information retrieval, such as search engine result ranking or some recommendation systems. Its main characteristics include:
Use of relevance-based metrics focused on top-ranked items, such as MAP or NDCG.
Position bias correction mechanisms.
Truncation of candidate pairs based on the most relevant items (according to the labels or predictions).
Other types of normalizations specific to this context.
2. Full Ranking of a Dataset
Another important and often overlooked use case is the complete ranking of all elements in a dataset. This can be framed as LTR with a single query (or several queries representing different periods in time series datasets), and it is applicable to problems where the evaluation metric is, for instance, Spearman correlation, or even binary classification problems with AUC as the metric.
The nature of this use case makes many LTR implementations unsuitable (for example, LightGBM does not support it well).
XGBoost, however, does support LTR through the rank:pairwise objective. Still, there are some impactful aspects that could be improved:
Weights
In LTR, weights are always considered at the query level.
But what happens in pairwise use cases where there is only one query, or when multiple queries exist but we want to assign instance-level weights?
Since the weight parameter in the DMatrix constructor is the same, this behavior should be generalized. It should be possible to:
Provide weights of length equal to the number of queries (to be applied per group), or
Provide weights of length equal to the number of observations (to be applied per instance).
A consistent internal approach (aligned with other objectives) would be:
Always interpret weight as per-instance, and
If a per-query weight array is passed (with length equal to the number of queries), internally expand it into a vector matching the number of instances by repeating the group weight according to the group size.
Label Difference Normalization
In full-dataset ranking scenarios, labels are often quasi-continuous or have high granularity.
(It is up to the user to discretize or bin the labels if needed.)
Pairs with similar labels are generally less informative than those with very different labels. Therefore, introducing a normalization based on label difference is a natural and useful idea.
Assuming labels are preprocessed to lie within the [0, 1] percentile scale, the following logic (from XGBoost source code):
The data I work with is a time series where it’s important to sort all records within certain temporal segments. The response is continuous with some degree of discretization, and the metric used to evaluate the model's fit is Spearman correlation.
In this case, since the response is continuous, it's useful to weight the pairs, giving more importance when the response difference between the pair is large, and less when the difference is smaller.
To use the proposed formulation, I transform the response into percentiles within each query so that it falls within the [0, 1] range.
In my use case, some elements are more relevant than others, so I apply weights and use the weighted version of Spearman correlation.
Currently, since xgboost does not support assigning weights at the observation level, only at the query level, I have to simulate the weights by repeating certain observations 2, 3, or even 10 times so that the model accounts for an approximation of the intended weight. This is inefficient both in training time and in GPU RAM usage.
There are two distinct categories of use cases in Learning to Rank (LTR):
1. Ranking Relevant Items Within a Query
This is the standard scenario in information retrieval, such as search engine result ranking or some recommendation systems. Its main characteristics include:
2. Full Ranking of a Dataset
Another important and often overlooked use case is the complete ranking of all elements in a dataset. This can be framed as LTR with a single query (or several queries representing different periods in time series datasets), and it is applicable to problems where the evaluation metric is, for instance, Spearman correlation, or even binary classification problems with AUC as the metric.
The nature of this use case makes many LTR implementations unsuitable (for example, LightGBM does not support it well).
XGBoost, however, does support LTR through the
rank:pairwise
objective. Still, there are some impactful aspects that could be improved:Weights
In LTR, weights are always considered at the query level.
But what happens in pairwise use cases where there is only one query, or when multiple queries exist but we want to assign instance-level weights?
Since the
weight
parameter in theDMatrix
constructor is the same, this behavior should be generalized. It should be possible to:A consistent internal approach (aligned with other objectives) would be:
weight
as per-instance, andLabel Difference Normalization
In full-dataset ranking scenarios, labels are often quasi-continuous or have high granularity.
(It is up to the user to discretize or bin the labels if needed.)
Pairs with similar labels are generally less informative than those with very different labels. Therefore, introducing a normalization based on label difference is a natural and useful idea.
Assuming labels are preprocessed to lie within the [0, 1] percentile scale, the following logic (from XGBoost source code):
https://github.com/dmlc/xgboost/blob/4e24639d7de3d8e0aae0ae0ab061c14f704c0c35/src/objective/lambdarank_obj.h#L123C3-L125C4
Can be generalized as:
Where
label_diff_normalization
is a user-defined parameter, with default value = 0.Since
y_high
andy_low
are percentiles, their absolute difference is bounded in [0, 1].label_diff_normalization == 0
,delta_metric
remains unchanged.label_diff_normalization
increases,delta_metric
decreases, effectively penalizing pairs with similar labels.The text was updated successfully, but these errors were encountered: