-
Notifications
You must be signed in to change notification settings - Fork 4
Description
While studying your paper "Rethinking Reward Modeling in Preference-Based Large Language Model Alignment", I encountered some confusions about the theoretical derivations and hope to get your clarification.
1. Inequality Direction in the Expectation of Order Consistency
We have the bounds:
The order consistency of the learned model with the oracle utility is given by:
Finally, we arrive at the conclusion:
I'm confused about how this "greater than or equal to" conclusion is derived. Since
2. Interpretation of $\epsilon$ and the "Incorrect Case" Description
In the "Incorrect Case", the statement "When the annotator is incorrect, the learned model agrees with the annotator with probability at most
3. Redundancy Concern in the Formula for $p_{incorrect}$ and Subsequent Multiplications
For the formula
- First, when considering the situation where the annotator is incorrect (with probability
$1 - \xi(\Delta r)$ ), and given the ambiguity around$\epsilon$ (whether it relates to model-oracle or model-annotator disagreement), the combination$\epsilon \cdot (1 - \xi(\Delta r))$ is said to represent "human annotation error + model-human inconsistency = consistency with the true answer". However, in the final expectation formula, we multiply$p_{\text{incorrect}}$ by$(1 - p_{\text{annotator}})$ again. Since$p_{\text{annotator}}$ is associated with annotator correctness ($\xi(\Delta r)$ for correct,$1 - \xi(\Delta r)$ for incorrect), it seems like we are double-counting the case of incorrect annotations. - Additionally, the subsequent derivation leads to a squared term like
$(1 - \xi(\Delta r))^2$ in$(1 - \epsilon) \cdot \xi^2(\Delta r) + \epsilon \cdot (1 - \xi(\Delta r))^2$ . Given the above confusions, is this square term necessary, or is there a misinterpretation in my understanding of how these probabilities interact?
Thank you for your time and attention to these questions. Your explanations will be crucial for me to further understand your important work.