You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR aims to bring improvements to the training args used for
pretraining MPNet from several angles:
1. improved default values for the training args[^1] and updates to some
to more closely follow hyperparams in MPNet paper
2. clearer, more succinct descriptions of what the core args are/do and
how to use them
3. addition of new A) options for some existing training args[^2] and B)
exposing/integrating some hardcoded parameters[^3] to new CLI args to be
adjustable by the user
[^1]: i.e. like grad clip which has become standard during pretrain
since the original repo came out
[^2]: added support for new activation fns "silu" and "relu2"
[^3]: the relaative attention hyperparams
`relative_attention_num_buckets` and `max_distance` are hardcoded to
values for 512 ctx, dhould be set-able by user w/ reasonable defaults
---------
Signed-off-by: peter szemraj <peterszemraj@gmail.com>
Signed-off-by: Peter Szemraj <peterszemraj+dev@gmail.com>
Co-authored-by: Peter Szemraj <peterszemraj+dev@gmail.com>
0 commit comments