Questions on Key Derivations in "Rethinking Reward Modeling in Preference-Based Large Language Model Alignment"

While studying your paper "Rethinking Reward Modeling in Preference-Based Large Language Model Alignment", I encountered some confusions about the theoretical derivations and hope to get your clarification.

### 1. Inequality Direction in the Expectation of Order Consistency
We have the bounds:
$`p_{\text{correct}} \geq (1-\epsilon) \cdot \xi(\Delta r)`$
$`p_{\text{incorrect}} \leq \epsilon \cdot (1-\xi(\Delta r))`$

The order consistency of the learned model with the oracle utility is given by:
$`\mathbb{E}_{x, y_1, y_2 \sim \ell(x)}\left[\mathbb{1}\left(\hat{H}\left(r(y_1,x) - r(y_2,x)\right) \geq 0\right) \mid \Delta r\right] = p_{\text{correct}} \cdot p_{\text{annotator}} + p_{\text{incorrect}} \cdot (1-p_{\text{annotator}})`$

Finally, we arrive at the conclusion:
$`\mathbb{E}_{x, y_1, y_2 \sim \ell(x)}\left[\mathbb{1}\left(\hat{H}\left(r(y_1,x) - r(y_2,x)\right) \geq 0\right) \mid \Delta r\right] \geq (1-\epsilon) \cdot \xi^2(\Delta r) + \epsilon \cdot (1-\xi(\Delta r))^2`$

I'm confused about how this "greater than or equal to" conclusion is derived. Since $`p_{\text{correct}}`$ has a lower bound (≥) and $`p_{\text{incorrect}}`$ has an upper bound (≤), could you explain in detail why their combination in the expectation formula results in a lower bound (≥) for the final order consistency?


### 2. Interpretation of $`\epsilon`$ and the "Incorrect Case" Description
In the "Incorrect Case", the statement "When the annotator is incorrect, the learned model agrees with the annotator with probability at most $`\epsilon`$" is perplexing. I initially thought $`\epsilon`$ represents the maximum probability that the learned model $`\hat{H}`$ disagrees with human annotations, but this description suggests a different meaning. Could you clarify the precise definition of $`\epsilon`$? Is it the probability of model-oracle disagreement, or does it pertain to model-annotator agreement in the context of incorrect annotations?


### 3. Redundancy Concern in the Formula for $`p_{incorrect}`$ and Subsequent Multiplications
For the formula $`p_{\text{incorrect}} \leq \epsilon \cdot (1 - \xi(\Delta r))`$:
- First, when considering the situation where the annotator is incorrect (with probability $`1 - \xi(\Delta r)`$), and given the ambiguity around $`\epsilon`$ (whether it relates to model-oracle or model-annotator disagreement), the combination $`\epsilon \cdot (1 - \xi(\Delta r))`$ is said to represent "human annotation error + model-human inconsistency = consistency with the true answer". However, in the final expectation formula, we multiply $`p_{\text{incorrect}}`$ by $`(1 - p_{\text{annotator}})`$ again. Since $`p_{\text{annotator}}`$ is associated with annotator correctness ($`\xi(\Delta r)`$ for correct, $`1 - \xi(\Delta r)`$ for incorrect), it seems like we are double-counting the case of incorrect annotations.
- Additionally, the subsequent derivation leads to a squared term like $`(1 - \xi(\Delta r))^2`$ in $`(1 - \epsilon) \cdot \xi^2(\Delta r) + \epsilon \cdot (1 - \xi(\Delta r))^2`$. Given the above confusions, is this square term necessary, or is there a misinterpretation in my understanding of how these probabilities interact?

Thank you for your time and attention to these questions. Your explanations will be crucial for me to further understand your important work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions on Key Derivations in "Rethinking Reward Modeling in Preference-Based Large Language Model Alignment" #3

1. Inequality Direction in the Expectation of Order Consistency

2. Interpretation of $\epsilon$ and the "Incorrect Case" Description

3. Redundancy Concern in the Formula for $p_{incorrect}$ and Subsequent Multiplications

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions on Key Derivations in "Rethinking Reward Modeling in Preference-Based Large Language Model Alignment" #3

Description

1. Inequality Direction in the Expectation of Order Consistency

2. Interpretation of $\epsilon$ and the "Incorrect Case" Description

3. Redundancy Concern in the Formula for $p_{incorrect}$ and Subsequent Multiplications

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions