Skip to content

Transformer Models for Humorous Text Generation. Fine-tuned on Russian jokes dataset with ALiBi, RoPE, GQA, and SwiGLU.Plus a custom Byte-level BPE tokenizer.

Notifications You must be signed in to change notification settings

estnafinema0/russian-jokes-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Russian Jokes Generator

This repository contains a set of Transformer-based language models fine-tuned on a dataset of Russian jokes (anecdotes). The models are designed to generate humorous and coherent Russian text. The repository includes three versions of the model: nano, mini, and small, each with different architectures and training configurations. Additionally, a custom Byte-level BPE tokenizer, trained on the Russian jokes dataset, is provided.

Table of Contents

Model Details

Architecture

The models are based on the Transformer architecture, enhanced with several advanced techniques:

  1. Positional Embeddings: ALiBi (Attention with Linear Biases) and RoPE (Rotary Positional Embeddings) are used for positional encoding.
  2. Attention Mechanism: Grouped-Query Attention (GQA) and Multi-Head Latent Attention (MHLA) are employed to improve efficiency and performance.
  3. Activation Function: SwiGLU activation is used in the feed-forward layers.

Three versions of the model are available:

  • Nano: 3 layers, 4 heads, 96 hidden dimensions.
  • Mini: 6 layers, 6 heads, 384 hidden dimensions. Trained with RoPE and MHLA.
  • Small: 12 layers, 12 heads, 768 hidden dimensions. Trained with RoPE and MHLA.

Training Details

The models were trained on the IgorVolochay/russian_jokes dataset.

Key training parameters include:

  1. Epochs: The number of full iterations over the dataset was determined by the n_step parameter in the Trainer initialization. The models were trained for 1 epoch (nano), 1 epoch (mini), and 6 epochs (small).
  2. Batch Size: 32 for nano and mini models, 64 for the small model.
  3. Learning Rate: 5e-4 with cosine decay for the small model, 3e-4 for the nano and mini models.
  4. Loss Function: Cross-entropy loss was used for training.
  5. Hardware: Training was conducted on an NVIDIA A100 GPU via Google Colab.

Performance

The performance of each model is summarized below:

Model Training Loss (min) Validation Loss (min)
Nano 3.784 3.932
Mini 3.127 3.144
Small 2.933 3.025

Training and validation loss curves for each model are provided below:

Nano Model

Nano Training Loss

Mini Model

Mini Training Loss

Small Model

Small Training Loss

Usage

Loading the Model

You can load the models and tokenizer from the Hugging Face Hub using the following code:

# Small model
model_small = TransformerForCausalLM.from_pretrained("estnafinema0/russian-jokes-generator", revision="small")
tokenizer = ByteLevelBPETokenizer.from_pretrained("estnafinema0/russian-jokes-generator")

Generating Text

To generate text using the model, you can use the following code:

text = "Штирлиц пришел домой"
input_ids = torch.tensor(tokenizer.encode(text), device=device)
model_output = model_small.generate(
    input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
)
print(tokenizer.decode(model_output[0].tolist()))

Examples

Here are some examples of jokes generated by the small model:

  1. Input: "Пришел Петя в баню и говорит" Output: "Пришел Петя в баню и говорит - Василий Иванович, вы знаете, кто я - Петя, или Петя? - Ахааха, и я - Ахаилая, я - Ахаил! - А какая Петя? - Я - Ахаилая! - Ну и я, когда я банкрот, банкротство, конечно..."

  2. Input: "Вышел как-то на крыльцо" Output: "Вышел как-то на крыльцо, а там плачет. Стукнулся: упал, выпал. Плачет – упал."

  3. Input: "Священник задает ребёнку вопрос" Output: "Священник задает ребёнку вопрос ему на ухо:- Что, братан, опять несёл?- Братан, ты что, братан, охуел?"

Repository Structure

The repository is organized as follows:

  • Models: Three versions of the model (nano, mini, small) are available in different branches:
    • main: Nano model.
    • mini: Mini model.
    • small: Small model.
  • Tokenizer: A custom Byte-level BPE tokenizer trained on the Russian jokes dataset.
  • Jupyter Notebook: A detailed notebook containing the implementation, training, and evaluation of the models.

Jupyter Notebook

The repository includes a Jupyter Notebook (russian_jokes_generator.ipynb) that provides a step-by-step guide to:

  • Training the tokenizer.
  • Implementing and training the Transformer models.
  • Evaluating the models and generating text.

You can find the notebook in the repository and run it locally or in Google Colab.

P.S. Now the notebook is released in russian.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for more details.


Thank you for your time!

About

Transformer Models for Humorous Text Generation. Fine-tuned on Russian jokes dataset with ALiBi, RoPE, GQA, and SwiGLU.Plus a custom Byte-level BPE tokenizer.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published