LLAMA3.2-Nepali-318M (from scratch) #570

Aananda-giri · 2025-03-15T20:37:54Z

Aananda-giri
Mar 15, 2025

Hi everyone! 👋

I’m thrilled to share my recent project: LLAMA3.2-Nepali-318M, a LLAMA3.2 model pretrained from scratch for the Nepali language. This project builds upon the LLAMA-3.2 model training code detailed in Build a Large Language Model (From Scratch), adapting it specifically for the Nepali language.

🔗 Project Links

💬 Chat Interface: Hugging Face Space
📦 Pre-Trained Model: Hugging Face Model
💻 Training Code: GitHub Repository
📊 Dataset: IRIISNEPAL/Nepali-Text-Corpus | NepBERTa

🔍 Major Changes from Original Code

1️⃣ New Model Configurations

LLAMA32_CONFIG = {
         "vocab_size": 50006,       # reduced vocabulary size
         "context_length": 512,      # reduced Context length (unrelated to model size but higheer context length consumes more RAM)
         "emb_dim": 1320,            # reduced Embedding dimension
         "n_heads": 20,              # reduced Number of attention heads
         "n_layers": 10,             # reduced Number of layers
         "hidden_dim": 5280,         # Size of the intermediate dimension in FeedForward
         "n_kv_groups": 5,           # reduced Key-Value groups for grouped-query attention
         "rope_base": 500_000.0,     # The base in RoPE's "theta"
         "dtype": torch.bfloat16,    # Lower-precision dtype to reduce memory usage
         "rope_freq": {              # RoPE frequency scaling
             "factor": 32.0,
             "low_freq_factor": 1.0,
             "high_freq_factor": 4.0,
             "original_context_length": 8192,
         }
     }

3️⃣ Dataset

Primarily trained on the IRIISNEPAL Nepali-Text-Corpus and NepBERTa datasets.

2️⃣ Tokenizer

Implemented a BPE tokenizer using Hugging Face's transformers library.
Vocab size: 50,000 tokens.

4️⃣ Dataloader:

I pre-tokenized the dataset and made minor modifications to the dataloader for using pre-trained dataset.

✨ Final Thoughts

A huge thank you to @rasbt for the inspiration and for writing an incredible book—the best books on LLMs I’ve read!

🤗 Happy coding! 🎉

rasbt · 2025-03-24T22:12:23Z

rasbt
Mar 24, 2025
Maintainer

That's great! Thanks for sharing this! I think smaller LLM like these are super nice because it makes it easier to experiment with them. There are also several readers asking me about non-English LLMs, and that's another nice resource to refer them to.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLAMA3.2-Nepali-318M (from scratch) #570

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

LLAMA3.2-Nepali-318M (from scratch) #570

Uh oh!

Uh oh!

Aananda-giri Mar 15, 2025

🔗 Project Links

🔍 Major Changes from Original Code

✨ Final Thoughts

Replies: 1 comment

Uh oh!

rasbt Mar 24, 2025 Maintainer

Aananda-giri
Mar 15, 2025

rasbt
Mar 24, 2025
Maintainer