Video-and-Language Future Event Predicition in PyTorch

Environment

Edit config.json to set configurations

Key	value
`vid_feature`	Video feature in 1 fps, each in a .npy file
`subtitles`	A preprocessed dictionary-like subtitles json file, with video as subtitles and subtitles as value, provided in `data/vlep_subtitles.json`
`train_anno`	Annotations in the training split
`val_anno`	Annotations in the validation split
`test_anno`	Annotations in the test split
`train_dialogue`	Whether dialogues are used. This only affects data loading, and does not change the model architecture.
`nhead`	The number of heads for all transformer-like modules
`base_epochs` and `base_lr`	The number of training epochs and learning rate before fine-tuning RoBERT
`fine_epochs` and `fine_lr`	The number of training epochs and learning rate to fine-tune RoBERTa
`grad_accum`	The number of iterations for gradient accumulation

python main.py config.json saved_model_name.pt

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
dataset.py		dataset.py
main.py		main.py
model.py		model.py
utils.py		utils.py