Skip to content

VishwamAI/VishwamAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VishwamAI

Efficient pre-training and fine-tuning framework with curriculum learning support for resource-constrained environments.

Features

  • Curriculum learning for efficient training progression
  • Mixed precision support for both GPU and TPU
  • Memory-efficient training with gradient checkpointing
  • Flexible architecture supporting both TPU and GPU deployments
  • Comprehensive monitoring and metrics tracking
  • Hardware-optimized kernels for TPU and GPU
  • Dynamic shape handling and optimization
  • Efficient parallel operations library
  • Tree-based and hybrid matrix multiplication strategies

Kernel Optimizations

TPU-Specific Features

  • BFloat16 precision with FP8 quantization support
  • Block-wise processing with 128x128 optimal block sizes
  • Memory-efficient flash attention implementation
  • Dynamic shape optimization for TPU MXU
  • Efficient parallel operations with XLA optimization

GPU-Specific Features

  • Mixed precision training (FP16/FP32)
  • Block-sparse operations optimization
  • Tensor core utilization
  • CUDA-optimized attention mechanisms
  • Warp-level parallelism

Performance Highlights

  • Matrix multiplication speedup with optimized kernels
  • Activation functions optimization showing ~20x speedup
  • Memory-efficient attention mechanisms
  • Dynamic quantization for reduced memory footprint

Import Test Status

Core Dependencies: 8/8 successful
Data Processing: 4/4 successful
Training Utilities: 4/4 successful
Memory Optimization: 5/5 successful
Additional Libraries: 3/3 successful
VishwamAI Modules: 7/7 successful
SONAR Dependencies: 5/5 successful
Multimodal Dependencies: 11/11 successful
TPU Kernels: 7/7 successful
TPU Optimized Layers: 6/6 successful

Overall: 60/60 imports successful (100%)

Training Optimizations

Curriculum Learning

  • Dynamic sequence length progression
  • Automated difficulty adjustment
  • Memory-efficient training strategy
  • Configurable update intervals

Hardware-Specific Optimizations

  • GPU (GTX 1650):
    • Optimized batch sizes for 4GB VRAM
    • FP16 precision training
    • Gradient accumulation
    • Memory-efficient model configuration
  • TPU:
    • BFloat16 precision support
    • XLA optimization
    • Efficient data pipeline
    • Dynamic batch sizing

Installation

  1. Clone the repository:
git clone https://github.com/VishwamAI/VishwamAI.git
cd VishwamAI
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Install hardware-specific dependencies:

For NVIDIA GPU:

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install nvidia-ml-py3

For TPU:

pip install --upgrade "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
  1. Update dependencies:
poetry update

Hardware-Specific Setup

NVIDIA GPU Setup (GTX 1650)

  1. Use the optimized GTX 1650 configuration:
python -m vishwamai.pretrain_efficient --config vishwamai/configs/training/gtx1650.yaml

For detailed GPU setup instructions, see README_GPU.md

TPU Setup

  1. Use the TPU-optimized configuration:
python -m vishwamai.pretrain_efficient --config vishwamai/configs/training/efficient_pretrain.yaml

Interactive Development

  1. Launch Jupyter notebook:
jupyter notebook notebooks/efficient_pretraining.ipynb

Project Structure

vishwamai/
├── configs/              # Configuration files
│   ├── training/        # Training configurations
│   └── model/          # Model architectures
├── vishwamai/           # Core implementation
│   ├── model.py        # Model architecture
│   ├── training.py     # Training pipeline
│   └── tokenizer.py    # Tokenization utilities
├── notebooks/           # Interactive examples
└── docs/               # Documentation

Configuration

The system supports different hardware configurations through YAML files:

  • configs/training/gtx1650.yaml: Optimized for NVIDIA GTX 1650 (4GB VRAM)
  • configs/training/efficient_pretrain.yaml: General TPU configuration

Key configuration sections:

training:
  curriculum:      # Curriculum learning settings
  mixed_precision: # Precision optimization
  batch_size:      # Hardware-specific batch sizes
  
model:
  hidden_size:     # Model architecture parameters
  num_layers:      # Adjusted for hardware constraints

Running Tests in Parallel

To run tests in parallel using pytest-xdist, use the following command:

pytest -n auto

Contributing

See CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see LICENSE file.

Citation

If you use VishwamAI in your research, please cite:

@software{vishwamai2025,
  title = {VishwamAI: Efficient Pre-training Framework},
  author = {Kasinadh Sarma},
  year = {2025},
  url = {https://github.com/VishwamAI/VishwamAI}
}

Support

For support and questions:

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Research Papers

The implementation is based on several research papers which can be found in the Research/ directory:

  • Tree of Thoughts reasoning
  • Mixture of Experts architectures
  • Attention mechanism optimizations
  • Efficient large language model training