Russian Jokes Generator

This repository contains a set of Transformer-based language models fine-tuned on a dataset of Russian jokes (anecdotes). The models are designed to generate humorous and coherent Russian text. The repository includes three versions of the model: nano, mini, and small, each with different architectures and training configurations. Additionally, a custom Byte-level BPE tokenizer, trained on the Russian jokes dataset, is provided.

Model Details

Architecture

The models are based on the Transformer architecture, enhanced with several advanced techniques:

Positional Embeddings: ALiBi (Attention with Linear Biases) and RoPE (Rotary Positional Embeddings) are used for positional encoding.
Attention Mechanism: Grouped-Query Attention (GQA) and Multi-Head Latent Attention (MHLA) are employed to improve efficiency and performance.
Activation Function: SwiGLU activation is used in the feed-forward layers.

Three versions of the model are available:

Nano: 3 layers, 4 heads, 96 hidden dimensions.
Mini: 6 layers, 6 heads, 384 hidden dimensions. Trained with RoPE and MHLA.
Small: 12 layers, 12 heads, 768 hidden dimensions. Trained with RoPE and MHLA.

Training Details

The models were trained on the IgorVolochay/russian_jokes dataset.

Key training parameters include:

Epochs: The number of full iterations over the dataset was determined by the n_step parameter in the Trainer initialization. The models were trained for 1 epoch (nano), 1 epoch (mini), and 6 epochs (small).
Batch Size: 32 for nano and mini models, 64 for the small model.
Learning Rate: 5e-4 with cosine decay for the small model, 3e-4 for the nano and mini models.
Loss Function: Cross-entropy loss was used for training.
Hardware: Training was conducted on an NVIDIA A100 GPU via Google Colab.

Performance

The performance of each model is summarized below:

Model	Training Loss (min)	Validation Loss (min)
Nano	3.784	3.932
Mini	3.127	3.144
Small	2.933	3.025

Training and validation loss curves for each model are provided below:

Nano Model

Mini Model

Small Model

Usage

Loading the Model

You can load the models and tokenizer from the Hugging Face Hub using the following code:

# Small model
model_small = TransformerForCausalLM.from_pretrained("estnafinema0/russian-jokes-generator", revision="small")
tokenizer = ByteLevelBPETokenizer.from_pretrained("estnafinema0/russian-jokes-generator")

Generating Text

To generate text using the model, you can use the following code:

text = "Штирлиц пришел домой"
input_ids = torch.tensor(tokenizer.encode(text), device=device)
model_output = model_small.generate(
    input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
)
print(tokenizer.decode(model_output[0].tolist()))

Examples

Here are some examples of jokes generated by the small model:

Input: "Пришел Петя в баню и говорит" Output: "Пришел Петя в баню и говорит - Василий Иванович, вы знаете, кто я - Петя, или Петя? - Ахааха, и я - Ахаилая, я - Ахаил! - А какая Петя? - Я - Ахаилая! - Ну и я, когда я банкрот, банкротство, конечно..."
Input: "Вышел как-то на крыльцо" Output: "Вышел как-то на крыльцо, а там плачет. Стукнулся: упал, выпал. Плачет – упал."
Input: "Священник задает ребёнку вопрос" Output: "Священник задает ребёнку вопрос ему на ухо:- Что, братан, опять несёл?- Братан, ты что, братан, охуел?"

Repository Structure

The repository is organized as follows:

Models: Three versions of the model (nano, mini, small) are available in different branches:
- main: Nano model.
- mini: Mini model.
- small: Small model.
Tokenizer: A custom Byte-level BPE tokenizer trained on the Russian jokes dataset.
Jupyter Notebook: A detailed notebook containing the implementation, training, and evaluation of the models.

Jupyter Notebook

The repository includes a Jupyter Notebook (russian_jokes_generator.ipynb) that provides a step-by-step guide to:

Training the tokenizer.
Implementing and training the Transformer models.
Evaluating the models and generating text.

You can find the notebook in the repository and run it locally or in Google Colab.

P.S. Now the notebook is released in russian.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for more details.

Thank you for your time!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
russian_jokes_generator.ipynb		russian_jokes_generator.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Russian Jokes Generator

Table of Contents

Model Details

Architecture

Training Details

Performance

Nano Model

Mini Model

Small Model

Usage

Loading the Model

Generating Text

Examples

Repository Structure

Jupyter Notebook

License

About

Uh oh!

Releases

Packages

Languages

estnafinema0/russian-jokes-generator

Folders and files

Latest commit

History

Repository files navigation

Russian Jokes Generator

Table of Contents

Model Details

Architecture

Training Details

Performance

Nano Model

Mini Model

Small Model

Usage

Loading the Model

Generating Text

Examples

Repository Structure

Jupyter Notebook

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages