Skip to content

Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing.

License

Notifications You must be signed in to change notification settings

phonism/genesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

23 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Genesis: A Lightweight Deep Learning Framework

Genesis Logo

License Python CUDA Triton Tests Documentation

A modern deep learning framework built from scratch with educational clarity and production performance

๐Ÿ“š Documentation | ๐Ÿš€ Quick Start | ๐Ÿ“Š Benchmarks | ๐Ÿค Contributing


๐ŸŒŸ Highlights

Genesis is a lightweight yet powerful deep learning framework that combines educational clarity with production-level performance. Built from scratch in Python, it features a unique dual-backend architecture: PyTorch for CPU operations and a completely independent CUDA/Triton implementation for GPU acceleration.

๐Ÿ”ฅ Latest Features:

  • โœ… Qwen Model Support: Full implementation with training and inference
  • โœ… Mixed Precision Training: FP16/BF16 support with Automatic Mixed Precision (AMP)
  • โœ… Advanced Training Features: Gradient clipping, learning rate schedulers
  • โœ… LLM Applications: Complete training pipeline for 0.5B+ models
  • โœ… Enhanced Performance: Optimized CUDA memory management and Triton kernels

Why Genesis?

  • ๐ŸŽฏ Educational Excellence: Clear, well-documented code that shows how deep learning frameworks work internally
  • โšก High Performance: Triton-optimized kernels achieving 60-85% efficiency compared to PyTorch on large tensors
  • ๐Ÿ”ง Modern Architecture: Clean separation between automatic differentiation, tensor operations, and neural network modules
  • ๐Ÿš€ Production Ready: Complete training pipeline support including mixed precision, distributed training, and model serialization
  • ๐Ÿ“– Learning Resource: Perfect for understanding deep learning framework internals while building real models

๐ŸŽฏ Key Features

Core Capabilities

  • โœ… Automatic Differentiation: Dynamic computational graph with full backpropagation support
  • โœ… Comprehensive Tensor Operations: Complete tensor arithmetic with GPU acceleration
  • โœ… Neural Network Modules: All essential layers including Multi-Head Attention, LayerNorm, etc.
  • โœ… Modern Optimizers: Adam, AdamW, SGD with learning rate scheduling and gradient clipping
  • โœ… Mixed Precision Training: Automatic Mixed Precision (AMP) with FP16/BF16 support
  • โœ… Model Management: Checkpoint saving/loading, state dict management
  • โœ… LLM Support: Built-in Qwen model implementation with SFT training and chat inference
  • โœ… Training Pipeline: Complete LLM training with datasets, schedulers, and checkpointing
  • โœ… Chat Applications: Ready-to-use chat interfaces for trained models

Technical Innovations

  • ๐Ÿ—๏ธ Dual Backend Architecture: CPU (PyTorch) + GPU (Pure CUDA/Triton)
  • ๐Ÿ”ฅ Triton Kernels: Hand-optimized GPU kernels for maximum performance
  • ๐Ÿงฎ Smart Memory Management: Efficient CUDA memory allocation and tensor views
  • ๐Ÿ“Š Profiling Tools: Built-in performance profiling and optimization utilities

๐Ÿ“Š Performance

Genesis achieves impressive performance through Triton-optimized kernels:

Operation Size Genesis PyTorch Efficiency
Add 4096ร—4096 0.025ms 0.04ms 66.7%
MatMul 4096ร—4096 2.1ms 2.0ms 95%
Softmax 8192ร—8192 0.8ms 0.9ms 112%
LayerNorm 4096ร—4096 0.5ms 0.6ms 120%
Attention 32ร—1024ร—1024 3.2ms 3.1ms 97%

Benchmarked on NVIDIA A100 GPU with CUDA 11.8

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/phonism/genesis.git
cd genesis

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install Genesis in development mode
pip install -e .

# For GPU acceleration (recommended)
export CUDA_VISIBLE_DEVICES=0  # Use first GPU

Basic Usage

import genesis
import genesis.nn as nn
import genesis.optim as optim

# Create tensors with automatic differentiation
x = genesis.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
y = genesis.tensor([[2.0, 0.0], [0.0, 2.0]], requires_grad=True)

# Perform operations
z = genesis.matmul(x, y)
loss = z.sum()

# Automatic differentiation
loss.backward()
print(f"Gradient of x: {x.grad}")

Neural Network Example

import genesis
import genesis.nn as nn
import genesis.optim as optim

class SimpleNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, output_dim)
        self.dropout = nn.Dropout(0.2)
        
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# Initialize model and optimizer
model = SimpleNet(784, 256, 10)
optimizer = optim.AdamW(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Training loop
for epoch in range(10):
    for batch_data, batch_labels in dataloader:
        # Forward pass
        outputs = model(batch_data)
        loss = criterion(outputs, batch_labels)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        
        # Gradient clipping (optional)
        nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        
        # Update weights
        optimizer.step()

Mixed Precision Training

import genesis

# Enable automatic mixed precision
genesis.enable_autocast = True

# Use autocast context
with genesis.autocast():
    outputs = model(inputs)
    loss = criterion(outputs, targets)

# Backward pass handles mixed precision automatically
loss.backward()
optimizer.step()

๐Ÿ—๏ธ Architecture

genesis/
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ autograd.py          # Automatic differentiation engine
โ”‚   โ”œโ”€โ”€ tensor.py            # Tensor class with grad support
โ”‚   โ””โ”€โ”€ functional.py        # Functional operations
โ”œโ”€โ”€ nn/
โ”‚   โ”œโ”€โ”€ modules.py           # Neural network modules
โ”‚   โ”œโ”€โ”€ functional.py        # NN functional operations
โ”‚   โ”œโ”€โ”€ attention.py         # Multi-head attention
โ”‚   โ””โ”€โ”€ layer_norm.py        # Normalization layers
โ”œโ”€โ”€ optim/
โ”‚   โ”œโ”€โ”€ optimizer.py         # Base optimizer class
โ”‚   โ”œโ”€โ”€ adam.py              # Adam and AdamW
โ”‚   โ”œโ”€โ”€ sgd.py               # SGD with momentum
โ”‚   โ””โ”€โ”€ lr_scheduler.py      # Learning rate schedulers
โ”œโ”€โ”€ backends/
โ”‚   โ”œโ”€โ”€ cpu/                 # CPU backend (PyTorch)
โ”‚   โ””โ”€โ”€ cuda/                # GPU backend (CUDA/Triton)
โ”‚       โ”œโ”€โ”€ cuda_tensor.py   # Pure CUDA tensor
โ”‚       โ””โ”€โ”€ triton_ops/      # Triton kernels
โ””โ”€โ”€ utils/
    โ”œโ”€โ”€ data.py              # Data loading utilities
    โ””โ”€โ”€ profile.py           # Performance profiling

๐Ÿ“š Documentation

Comprehensive documentation is available in the docs/ directory:

๐Ÿงช Testing

Genesis maintains high code quality with comprehensive testing:

# Run all tests
python -m pytest tests/

# Run specific test module
python -m pytest tests/test_autograd.py

# Run with coverage
python -m pytest tests/ --cov=genesis --cov-report=html

# Run performance benchmarks
python benchmark/bench_ops.py

๐Ÿค Contributing

We welcome contributions! Genesis is designed to be hackable and extensible.

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

# Run code formatting
black genesis/
isort genesis/

# Run type checking
mypy genesis/

See CONTRIBUTING.md for detailed contribution guidelines.

๐Ÿšฆ Roadmap

  • Core tensor operations and autograd
  • Essential neural network modules
  • Optimizers and schedulers
  • Mixed precision training
  • Qwen LLM implementation
  • More model architectures (GPT, BERT, ViT)
  • Distributed training improvements
  • JIT compilation support
  • Model quantization
  • Mobile deployment

See ROADMAP.md for detailed plans.

๐Ÿ“Š Benchmarks

Detailed performance comparisons are available in benchmark/:

  • bench_ops.py - Elementwise operations
  • bench_matmul.py - Matrix multiplication
  • bench_attention.py - Attention mechanisms
  • bench_end_to_end.py - Full model training

๐ŸŒŸ Examples

The apps/ and samples/ directories contain various examples:

LLM Applications (apps/llm/):

  • train_sft_qwen.py - Qwen supervised fine-tuning
  • chat_qwen.py - Interactive chat with trained models
  • torch_qwen.py - PyTorch comparison benchmarks

General Examples (samples/):

  • sample.py - Basic neural network training
  • mnist_cnn.py - CNN for MNIST classification
  • transformer.py - Transformer model implementation

Quick Start Commands:

# Train a Qwen model
cd apps/llm && python train_sft_qwen.py

# Chat with trained model
cd apps/llm && python chat_qwen.py

# Run benchmarks
python benchmark/simple_qwen_bench.py

๐Ÿ“œ License

Genesis is released under the MIT License. See LICENSE for details.

๐Ÿ™ Acknowledgments

Genesis is inspired by and learns from many excellent projects:

  • PyTorch - API design and tensor operations
  • Triton - GPU kernel optimization
  • TinyGrad - Minimalist design philosophy
  • JAX - Functional programming concepts

๐Ÿ“ฎ Contact

  • GitHub Issues: Bug reports and feature requests
  • Discussions: Questions and community support
  • Email: genesis-dev@example.com

Built with โค๏ธ for the deep learning community

โญ Star us on GitHub to support the project!

About

Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published