Transformer-Based Video Classification in PyTorch using Rotary Positional Embeddings

Description

This project demonstrates the implementation of a transformer-based video classifier in PyTorch.

High-level features from the first 24 frames of each video were extracted with a pre-trained image classification model. A transformer-based classification model was then trained on the sequences of extracted frame features to recognize the action performed in a video.

The model consists of a TransformerEncoder module with Rotary Positional Embedding (RoPE) and 4 attention heads, followed by a classifier module which performs global average pooling over the time dimension before generating predictions.

Dataset and Augmentation

The model was trained on a subsampled version of the UCF101 dataset containing only the 5 most frequent classes. This subsampled dataset, split into train and test data, was created by Paul (2021), and can be downloaded from here. Data augmentation was performed by producing a grayscale copy of each training sample, doubling the size of the training dataset.

Credits

Paul, S. (2023). Video Classification with Transformers. https://keras.io/examples/vision/video_transformers/

Sarkar, A. (2025). Transformer Model Tutorial in PyTorch: From Theory to Code. https://www.datacamp.com/tutorial/building-a-transformer-with-py-torch

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
train_video_classifier.ipynb		train_video_classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transformer-Based Video Classification in PyTorch using Rotary Positional Embeddings

Description

Dataset and Augmentation

Credits

About

Uh oh!

Releases

Packages

Languages

License

gerdis/pytorch_video_classification

Folders and files

Latest commit

History

Repository files navigation

Transformer-Based Video Classification in PyTorch using Rotary Positional Embeddings

Description

Dataset and Augmentation

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages