Skip to content

Pure Rust engine for BitNet LLMs — Conversion, Inference, Training and Research. With streaming and GPU/CPU support

Notifications You must be signed in to change notification settings

ocentra/bitnet.rs

Repository files navigation

BitNet-rs

⚠️ Disclaimer: This project is a work in progress!

BitNet-rs is under active development and not yet production-ready. While there are many tests and rapid progress, the codebase is evolving quickly and breaking changes are likely. We are actively looking for contributors and help! If you're interested in Rust, LLMs, or kernel development, please join us—see Contributing below.

Rust 2021 License CI Status Platform


Features

  • Pure Rust — No Python or C++ runtime dependencies
  • 🧩 ModularCore, Converter, Tools, App, and WASM crates
  • 🖥️ CPU & GPU — SIMD and WGSL (via wgpu) support
  • 📦 Streaming/blockwise model loading and inference
  • 🛠️ Model conversion — HuggingFace to BitNet format
  • 🔄 Quantization — B1.58 ternary weights
  • 🎯 Optimized — SIMD, LUT, and GPU kernels ( DX12 Naga fixed )
  • 🎨 GUI & CLI — Interactive and scriptable interfaces
  • 🔍 Visualization — Attention maps and kernel profiling
  • 🌐 WASM-ready
  • 🎯 Vibe Coding Ready — AI-assisted development with comprehensive planning i.e Project Plan, Checklist, and Cursor integration

Table of Contents


Overview

BitNet-rs is a Rust-based toolkit for BitNet model conversion, inference, and experimentation. It is designed for:

  • Model conversion: Convert Hugging Face safetensors to BitNet's custom, quantized, streaming-friendly format.
  • Inference: Run BitNet models efficiently on CPU and GPU, with per-block streaming and minimal memory usage.
  • Extensibility: Modular crates for core logic, conversion, tools, and user-facing apps (CLI/GUI).
  • Validation: Rigorous test coverage, golden tests, and kernel validation.

Quick Start

Note: You need a recent Rust toolchain (nightly recommended) and a supported platform (Linux, macOS, Windows; CPU or GPU).

# Clone the repo
git clone https://github.com/ocentra/bitnet-ocentra.git
cd bitnet-ocentra

# Build everything
cargo build --workspace

# Download a model and convert it
cargo run -p bitnet-converter -- --input-dir models/Original/microsoft/bitnet-b1.58-2B-4T-bf16 --output-dir models/Converted/microsoft/bitnet-b1.58-2B-4T-bf16

# Run the app (CLI/GUI)
cargo run -p bitnet-app -- --help

Build Instructions

  1. Install Rust (nightly recommended): https://rustup.rs/

  2. Clone the repository:

    git clone https://github.com/ocentra/bitnet-ocentra.git
    cd bitnet-ocentra
    
  3. Build all crates:

    cargo build --workspace
  4. Run the converter or app:

    cargo run -p bitnet-converter -- --help
    cargo run -p bitnet-app -- --help

    Run the GUI version

    cargo run --release -p file_combiner_gui

    Run the CLI version

    cargo run -p bitnet-tools --bin combine_files -- --help

Benchmarking and Testing

We use a comprehensive test suite to ensure correctness, performance, and robustness. The tests are located in the tests directory of each crate.

Running All Tests

To run all tests for all crates in the workspace, use:

cargo test --workspace --all-features

Running Kernel Tests

The core of our performance comes from the WGSL compute kernels. We have a dedicated test suite for them in the bitnet-core crate. To run only these tests and see detailed output, use:

cargo test -p bitnet-core --test kernel_tests -- --nocapture

To run the ignored tests, which include stress tests and long-running benchmarks:

cargo test -p bitnet-core --test kernel_tests -- --ignored --nocapture

Interpreting Benchmark Results

The performance_benchmark_gpu_vs_scalar test provides detailed performance metrics. When you run it, you will see output like this:

Performance Benchmark (100 iterations):
  GPU (Wall Time):    Avg: 290.000µs | Total: 29.000ms
  GPU (Kernel Time):  Avg: 15.000µs  | Total: 1.500ms
  Scalar (CPU Time):  Avg: 480.000µs | Total: 48.000ms
Speedup (Wall vs Scalar):   1.65x
Speedup (Kernel vs Scalar): 32.00x

GPU (Wall Time): Total time from the CPU's perspective, including overhead for buffer management and data transfer. GPU (Kernel Time): The pure GPU execution time, measured with high-precision timestamp queries. This is the best measure of the kernel's raw performance. Scalar (CPU Time): The performance of the equivalent non-parallelized CPU implementation. Speedup (Kernel vs Scalar): This shows the true speedup of the GPU kernel over the CPU implementation. This is the most important metric for performance analysis.

Note: The numbers above are illustrative. Please run the benchmarks on your own hardware to get accurate results.


Known Issues

DirectX 12 Backend

Issue: The WGSL kernel can trigger a suspected loop-unrolling bug in the DirectX (DX12) shader compiler (FXC/DXC). This may cause tests to fail on Windows machines using the Dx12 backend.

Status: A robust workaround has been found and is now fully documented and tested. - The workaround uses a flattened i32 accumulator and per-element decode for tile_b, avoiding the problematic WGSL patterns. - The full BitNet kernel logic now passes on DX12 with this fix. - See the DX12 test report for the special finding and full details.

Reference: See the related Naga issue for more details.

Legacy note: The cross_device_consistency_test previously skipped the Dx12 backend, but with the workaround, DX12 is now supported for the full kernel logic.

DX12 Regression Testing

To ensure the DX12 workaround remains effective and to prevent regressions:

  • The file crates/bitnet-core/tests/DX12_test.rs is a comprehensive diagnostic and regression test suite for the DX12 WGSL bug.

  • To run the regression test:

    cargo test --package bitnet-core --test DX12_test -- --nocapture
  • This will generate a detailed report at logs/dx12_test.md.

  • Check the report:

    • Look for the ⭐ Special Finding section, which highlights the test demonstrating the robust workaround.
    • Ensure that the "Full Kernel With Fix" test passes. If it fails, DX12 compatibility may be broken.
  • Best practice:

    • Run this regression test after any changes to the kernel or related code to ensure the workaround is not broken and the bug is not reintroduced.

Usage Examples

  • Convert a model:

    cargo run -p bitnet-converter -- --input-dir models/Original/microsoft/bitnet-b1.58-2B-4T-bf16 --output-dir models/Converted/microsoft/bitnet-b1.58-2B-4T-bf16
  • Run the app (CLI/GUI):

    cargo run -p bitnet-app -- --help
  • File Combiner Tools:

    # Run the GUI version
    cargo run --release -p file_combiner_gui
    
    # Run the CLI version
    cargo run -p bitnet-tools --bin combine_files -- --help

Contributing

Before contributing, please:

  1. Read our PROJECT_PLAN.md for architecture details
  2. Review CHECKLIST.md for implementation status
  3. Check crate-level READMEs for module-specific guidelines
  4. Set up Cursor for optimal vibe coding experience

We use vibe coding practices to maintain high code quality and efficient development. See our Vibe Coding section for details.


Vibe Coding

This project was developed using a modern AI-assisted development approach we call "vibe coding". Here's our exact workflow and how you can use it in your projects.

Our Real-World Development Process

1. Project Planning & Structure

# Initial project structure
PROJECT_PLAN.md        # Detailed architecture & implementation plan
CHECKLIST.md          # Task tracking & validation requirements
crates/               # Modular crate structure
  ├── bitnet-core/    # Core engine with its own README
  ├── bitnet-app/     # Application with its own README
  └── ...

2. AI-Assisted Development Workflow

Here's exactly how we built BitNet-rs using AI tools:

  1. Initial Planning with AI

    # 1. Created PROJECT_PLAN.md with high-level architecture
    # 2. Used Cursor + Claude to refine the plan:
    #    - Validated technical approaches
    #    - Identified potential issues
    #    - Added detailed implementation notes
  2. Code Organization & Review

    # Used our file combiner tool to prepare code for AI review:
    cargo combine-files bitnet-core   # Combines core crate files
    cargo combine-files bitnet-app    # Combines app crate files
    
    # Generated files:
    bitnet-core_combined.txt    # Single file for AI review
    bitnet-app_combined.txt     # Single file for AI review
  3. Iterative Development with AI

    # 1. Write code with Cursor + Claude
    # 2. Regular check-ins with AI:
    #    - Review code structure
    #    - Validate implementations
    #    - Debug issues
    #    - Optimize performance

Practical Tips from Our Experience

  1. Project Structure First

    • Start with a detailed PROJECT_PLAN.md
    • Create a CHECKLIST.md for tracking
    • Add READMEs in each major directory
    • This structure helps AI understand context
  2. Code Review Workflow

    # 1. Combine related files:
    cargo combine-files crate-name
    
    # 2. Ask AI to review in Cursor:
    "Please review this code for..."
    
    # 3. Apply suggestions using Cursor's AI
  3. Using Multiple AI Models

    • Cursor + Claude: Main development
    • Gemini Pro: Architecture review
    • GitHub Copilot: Quick suggestions
    • Cross-validate between models
  4. Documentation Strategy

    # Each crate has:
    - README.md           # Usage & examples
    - src/lib.rs         # API documentation
    - tests/             # Test documentation

Our Tools & Setup

  1. File Combiner

    # .cargo/config.toml
    [alias]
    combine-files = "run --package bitnet-tools --bin combine_files --"
  2. Development Environment

    # Primary: Cursor IDE
    # - AI-assisted coding
    # - Code navigation
    # - Integrated terminal
  3. AI Integration

    # 1. Cursor for main development
    # 2. Browser tabs for:
    #    - Gemini Pro (architecture)
    #    - GitHub Copilot (quick fixes)

Real Examples from This Project

  1. Planning Phase

    # PROJECT_PLAN.md excerpt:
    ## Deep Dive: Critical Components
    1. BitLinear CustomOp
    2. CPU SIMD Implementation
    3. GPU Kernel Architecture
  2. Implementation Phase

    // Example of AI-assisted implementation:
    // 1. Wrote high-level structure
    // 2. AI helped optimize SIMD code
    // 3. Validated with test cases
  3. Review & Optimization

    # 1. Combined files for review
    # 2. AI analyzed performance
    # 3. Implemented suggestions

Getting Started with Vibe Coding

  1. Setup Your Project

    # 1. Create planning documents
    touch PROJECT_PLAN.md CHECKLIST.md
    
    # 2. Set up file combiner
    cargo add bitnet-tools --path crates/bitnet-tools
  2. Development Workflow

    # 1. Plan with AI
    # 2. Implement with Cursor
    # 3. Review with combined files
    # 4. Iterate and optimize
  3. Best Practices

    • Keep planning documents updated
    • Use consistent file structure
    • Document AI interactions
    • Cross-validate between models

Advanced AI Development Patterns

Test-Driven Development with AI

We discovered that asking AI to write code directly often produces suboptimal results. Instead, we developed this effective pattern:

  1. Test-First Approach

    // 1. Ask AI to write tests first
    #[test]
    fn test_bitnet_kernel_correctness() {
        // AI writes detailed test cases
        // with expected inputs/outputs
    }
    
    // 2. Use tests to define exact API
    pub trait BitnetKernel {
        fn execute(&self, input: &[f32]) -> Result<Vec<f32>>;
    }
    
    // 3. Implement based on test requirements
    // 4. Add validation tests
  2. Development Flow

    # 1. Write test with AI
    cargo test --test kernel_tests -- --nocapture
    
    # 2. Iterative Implementation
    while test_status != "passed" {
        # Ask AI to analyze failure
        # Implement fixes
        # Run tests again
    }

Using Multiple AI Models Effectively

  1. Project-Wide Context with Gemini

    # 1. Combine all project files
    cargo combine-files --all-crates
    
    # 2. Feed to Gemini 2.5 Pro (1M context)
    # - Full project structure
    # - All planning documents
    # - Implementation details
  2. Specialized Problem Solving

    # Example: Working on bitnet-converter
    
    # 1. Get solution from Gemini
    cargo combine-files bitnet-converter
    # Feed to Gemini with specific task
    
    # 2. Validate with Cursor
    "Analyze this code from Gemini, focusing on:
     - Dependency correctness
     - Error handling
     - Performance implications"

AI-Driven Test Development

Here's our proven pattern for complex implementations:

  1. Goal-Based Testing

    // Tell Cursor: "Goal: Test WGSL kernel correctness
    // Don't stop until tests pass"
    
    #[test]
    fn test_wgsl_kernel() {
        // 1. AI writes comprehensive test
        // 2. AI runs and debugs
        // 3. AI improves test coverage
    }
  2. YOLO Auto Mode

    # Tell Cursor:
    "Enter YOLO auto mode:
     1. Search for similar implementations
     2. Write test cases
     3. Implement solution
     4. Test and fix
     5. Repeat until all tests pass"
  3. Real Example: Shader Testing

    // Complex WGSL kernel testing
    #[test]
    fn test_bitnet_shader_computation() {
        let input = generate_test_data();
        let kernel = BitnetShader::new();
        
        // 1. Test scalar path
        let scalar_result = compute_scalar_reference(&input);
        
        // 2. Test GPU path
        let gpu_result = kernel.execute(&input)?;
        
        // 3. Compare results
        assert_results_match(scalar_result, gpu_result);
    }

Tips for Complex Implementations

  1. Shader/Kernel Development

    # 1. Write scalar version first
    # 2. Ask AI to write exhaustive tests
    # 3. Implement optimized version
    # 4. Validate against scalar
  2. Using AI for Research

    # 1. Combine relevant code
    cargo combine-files crates/bitnet-core/src/kernels
    
    # 2. Ask AI to:
    # - Analyze similar implementations
    # - Suggest optimization strategies
    # - Write validation tests
  3. Iterative Refinement

    // 1. Start with basic test
    #[test]
    fn test_basic_functionality() {
        // Simple case
    }
    
    // 2. Add edge cases
    #[test]
    fn test_edge_cases() {
        // AI helps identify cases
    }
    
    // 3. Performance testing
    #[test]
    fn test_performance_requirements() {
        // AI helps set benchmarks
    }

This approach helped us tackle even the most complex parts of the project, like WGSL shaders and SIMD kernels, with confidence and reliability.

Leveraging Large Context Windows & Multi-AI Workflow

Gemini 2.5 Pro: The Project Architect

We discovered a game-changing workflow using Gemini 2.5 Pro's massive 1M token context window in Google AI Studio:

  1. Full Project Context

    # Combine EVERYTHING into one file:
    cargo combine-files --all-crates --include-docs
    
    # This includes:
    PROJECT_PLAN.md          # Architecture & deep dives
    CHECKLIST.md            # Implementation status
    All README.md files     # From each crate
    All source files        # Entire codebase
    All test files          # Complete test suite
  2. Real Example: Project-Wide Analysis

    # 1. Feed the combined file to Gemini 2.5 Pro
    # Example prompt:
    "Analyze this entire BitNet-rs project. Focus on:
     - WGSL shader implementation in bitnet-core
     - Integration with CPU kernels
     - Test coverage gaps
     - Performance bottlenecks"
    
    # 2. Gemini sees EVERYTHING at once:
    # - Can reference code from any crate
    # - Understands full architecture
    # - Spots cross-crate dependencies
    # - Identifies global patterns
  3. Complex Problem Solving

    # Real workflow we used for SIMD kernels:
    
    # 1. Combine all relevant files
    cargo combine-files \
      crates/bitnet-core/src/kernels \
      crates/bitnet-core/tests/kernel_tests.rs \
      PROJECT_PLAN.md
    
    # 2. Ask Gemini to architect the solution:
    "Design a SIMD kernel implementation that:
     - Matches the scalar implementation
     - Uses AVX2 intrinsics efficiently
     - Includes comprehensive test cases
     - Follows our project architecture"
    
    # 3. Get complete solution including:
    # - Full implementation
    # - Test suite
    # - Performance considerations
    # - Integration guidelines

Multi-AI Synergy Workflow

We developed a powerful workflow combining multiple AI models' strengths:

  1. Gemini → Claude → Cursor Pipeline

    # Step 1: Get Initial Design (Gemini 2.5 Pro)
    # Feed full project context
    # Get comprehensive solution
    
    # Step 2: Refinement (Claude)
    "Analyze this code from Gemini. Focus on:
     1. Rust idioms and safety
     2. Error handling patterns
     3. Performance implications
     4. Integration with existing code"
    
    # Step 3: Implementation (Cursor)
    "You're the main coder. Review this design:
     1. Validate dependencies
     2. Check safety assumptions
     3. Implement with proper error handling
     4. Add comprehensive tests"
  2. Real Example: Shader Development

    // 1. Gemini: Architecture & Initial Code
    // Given full project context, designed:
    struct BitnetShader {
        // Complete shader architecture
        // Memory layout design
        // Workgroup optimization
    }
    
    // 2. Claude: Code Review & Refinement
    // Analyzed and improved:
    // - Safety considerations
    // - Error handling
    // - Performance optimizations
    
    // 3. Cursor: Implementation & Testing
    // Final implementation with:
    // - Proper Rust idioms
    // - Comprehensive test suite
    // - Performance benchmarks
  3. Workflow Benefits

    # 1. Gemini 2.5 Pro
    - Sees entire project context
    - Understands global architecture
    - Provides complete solutions
    
    # 2. Claude
    - Excellent code review
    - Strong Rust knowledge
    - Safety focus
    
    # 3. Cursor
    - Immediate feedback
    - Code navigation
    - Test execution

Real-World Example: Complex Feature Development

Here's how we actually developed the BitNet quantization pipeline:

  1. Initial Architecture (Gemini)

    # 1. Combined all files:
    cargo combine-files \
      PROJECT_PLAN.md \
      crates/bitnet-converter/src/*.rs \
      crates/bitnet-core/src/kernels/*.rs
    
    # 2. Asked Gemini:
    "Design a quantization pipeline that:
     - Converts FP32 weights to ternary
     - Implements efficient packing
     - Includes validation tests
     - Matches reference implementation"
  2. Code Review & Refinement (Claude)

    // Received from Claude:
    // - Improved error handling
    // - Better type safety
    // - Optimized algorithms
    // - Additional test cases
    
    pub struct QuantizationPipeline {
        // Refined implementation
        // With Claude's improvements
    }
  3. Final Implementation (Cursor)

    // Cursor helped:
    // 1. Integrate with existing code
    // 2. Add proper error types
    // 3. Implement all tests
    // 4. Validate performance

This multi-AI workflow was crucial for handling complex features like:

  • WGSL shader implementation
  • SIMD kernel optimization
  • Quantization pipeline
  • Test suite development

The key was leveraging each AI's strengths:

  • Gemini's massive context window for architecture
  • Claude's code review and safety focus
  • Cursor's immediate feedback and testing

Checklist & Status

  • The current implementation status of each module and file is tracked in CHECKLIST.md.
  • Use this to find stubs, partials, and missing features.
  • The checklist is updated regularly to reflect the actual state of the codebase.

References


Crate Overview


For more details, see the project plan and individual crate READMEs.

About

Pure Rust engine for BitNet LLMs — Conversion, Inference, Training and Research. With streaming and GPU/CPU support

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published