Efficient pre-training and fine-tuning framework with curriculum learning support for resource-constrained environments.
- Curriculum learning for efficient training progression
- Mixed precision support for both GPU and TPU
- Memory-efficient training with gradient checkpointing
- Flexible architecture supporting both TPU and GPU deployments
- Comprehensive monitoring and metrics tracking
- Hardware-optimized kernels for TPU and GPU
- Dynamic shape handling and optimization
- Efficient parallel operations library
- Tree-based and hybrid matrix multiplication strategies
- BFloat16 precision with FP8 quantization support
- Block-wise processing with 128x128 optimal block sizes
- Memory-efficient flash attention implementation
- Dynamic shape optimization for TPU MXU
- Efficient parallel operations with XLA optimization
- Mixed precision training (FP16/FP32)
- Block-sparse operations optimization
- Tensor core utilization
- CUDA-optimized attention mechanisms
- Warp-level parallelism
- Matrix multiplication speedup with optimized kernels
- Activation functions optimization showing ~20x speedup
- Memory-efficient attention mechanisms
- Dynamic quantization for reduced memory footprint
Core Dependencies: 8/8 successful
Data Processing: 4/4 successful
Training Utilities: 4/4 successful
Memory Optimization: 5/5 successful
Additional Libraries: 3/3 successful
VishwamAI Modules: 7/7 successful
SONAR Dependencies: 5/5 successful
Multimodal Dependencies: 11/11 successful
TPU Kernels: 7/7 successful
TPU Optimized Layers: 6/6 successful
Overall: 60/60 imports successful (100%)
- Dynamic sequence length progression
- Automated difficulty adjustment
- Memory-efficient training strategy
- Configurable update intervals
- GPU (GTX 1650):
- Optimized batch sizes for 4GB VRAM
- FP16 precision training
- Gradient accumulation
- Memory-efficient model configuration
- TPU:
- BFloat16 precision support
- XLA optimization
- Efficient data pipeline
- Dynamic batch sizing
- Clone the repository:
git clone https://github.com/VishwamAI/VishwamAI.git
cd VishwamAI
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Install hardware-specific dependencies:
For NVIDIA GPU:
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install nvidia-ml-py3
For TPU:
pip install --upgrade "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
- Update dependencies:
poetry update
- Use the optimized GTX 1650 configuration:
python -m vishwamai.pretrain_efficient --config vishwamai/configs/training/gtx1650.yaml
For detailed GPU setup instructions, see README_GPU.md
- Use the TPU-optimized configuration:
python -m vishwamai.pretrain_efficient --config vishwamai/configs/training/efficient_pretrain.yaml
- Launch Jupyter notebook:
jupyter notebook notebooks/efficient_pretraining.ipynb
vishwamai/
├── configs/ # Configuration files
│ ├── training/ # Training configurations
│ └── model/ # Model architectures
├── vishwamai/ # Core implementation
│ ├── model.py # Model architecture
│ ├── training.py # Training pipeline
│ └── tokenizer.py # Tokenization utilities
├── notebooks/ # Interactive examples
└── docs/ # Documentation
The system supports different hardware configurations through YAML files:
configs/training/gtx1650.yaml
: Optimized for NVIDIA GTX 1650 (4GB VRAM)configs/training/efficient_pretrain.yaml
: General TPU configuration
Key configuration sections:
training:
curriculum: # Curriculum learning settings
mixed_precision: # Precision optimization
batch_size: # Hardware-specific batch sizes
model:
hidden_size: # Model architecture parameters
num_layers: # Adjusted for hardware constraints
To run tests in parallel using pytest-xdist
, use the following command:
pytest -n auto
See CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see LICENSE file.
If you use VishwamAI in your research, please cite:
@software{vishwamai2025,
title = {VishwamAI: Efficient Pre-training Framework},
author = {Kasinadh Sarma},
year = {2025},
url = {https://github.com/VishwamAI/VishwamAI}
}
For support and questions:
- Open an issue on GitHub
- Check existing documentation in
/docs
- Refer to hardware-specific guides:
- README_GPU.md for GPU setup
- HUGGINGFACE_SETUP.md for HuggingFace integration
- Quick Start Guide
- Technical Documentation
- Advanced Training Guide
- Error Correction System
- Tree of Thoughts
- Architecture Overview
Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
The implementation is based on several research papers which can be found in the Research/ directory:
- Tree of Thoughts reasoning
- Mixture of Experts architectures
- Attention mechanism optimizations
- Efficient large language model training