-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
Ordered approximately by importance. Don't forget to include a baseline. Something like 1-layer uni-directional without direction.
Important:
- Base architecture:
- enc+dec;
- enc+dec+attn;
- enc+ctc;
- enc+ctc+dec;
- enc+ctc+dec+attn
- Frame encoding scheme: flattening vs. CNN (jz)
- Video segmentation (i.e. sent-based vs. line-based (non-sent))
- Various dimensions/the basic stuff (e.g. grad norm (which seems really important), batch size, lstm dim, char dim) (i.e. varying each of the keyword argument of encoder/decoder/training func) (jz)
- Regularization
- Stopping criterion
- josephz: Let's just do 50 epochs?
Useful:
- Various attention functions (see Attention Implementation #11) (yli)
- Teacher forcing ratio (and ways to change this ratio over time (see Bengio et al., 2015)) (yli)
- Gradient normalization
- Use all 68 points vs. a subset (yli)
- Have a global Adam vs. a new Adam every epoch. We are currently doing the latter because it seems to work better, but that actually doesn't make sense. This could be related to the learning rate, which is a separate item below.
- Temperature (in train, eval, and/or inference) (yli)
Try if time permitted:
- Global/local/input-feeding attentions (see Luong et al., 2015)
- Optimizers?
- Learning rates & learning rate decay methods
- Char-based vs. word-based
Metadata
Metadata
Assignees
Labels
No labels