Skip to content

Extension: Add ability to set prior probabilities #9

@jukofyork

Description

@jukofyork

If you also assume Gumbel-distributed errors with equal scale parameters for the priors then I think it's as simple as adding the logs of the priors:

$$ p(y = k) = \frac{\exp(z_k + \log P_{\text{prior}}(y = k))}{\sum_j \exp(z_j + \log P_{\text{prior}}(y = j))} $$

Or alternatively:

$$ p(y = k) = \frac{\exp(z_k) \times P_{\text{prior}}(y = k)}{\sum_j \exp(z_j) \times P_{\text{prior}}(y = j)} $$

This only works for the Softmax function and is also why it's valid to take a subset of the categories like you are doing for the tokens due to the IIA property.


You can go even further and allow variable scale parameters for the priors, but it requires numerical integration and is probably too much hassle to be worthwhile.


Another alternative is convert into a multinomial probit model:

You can easily set up a system of equations to convert the logits (location parameters of the Gumbel distribution) to the location parameters (ie: means) of a Gaussian distribution with SD=1. There is only one solution to this and it's easy to find in a few steps of Newton's method.

This would then let you use Gaussian-distributed priors (which are likely much more intuitive to the average user), but again if the number of classes is more than 2; it will require numerical integration and probably too much hassle to implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions