Pure Go LLM for CPUs

The goal of this feature is to develop a highly efficient, pure-Go implementation for inference using Microsoft's BitNet b1.58‑2B 4T language model, optimized specifically for CPU environments, with potential future support for GPU acceleration. This implementation will handle language model inference with a context length of up to 4096 tokens, enabling practical text-generation and completion tasks. Leveraging BitNet's 2-bit ternary quantization, it aims to achieve exceptionally low memory usage and high throughput by extensively using Go's native bitwise operations and scalable goroutine-based concurrency across multiple CPU cores. The resulting inference engine will be lightweight, scalable, and suitable for both edge and cloud environments.

This roadmap outlines a sequence of small, sequential tasks to implement Microsoft’s BitNet b1.58‑2B 4T model in pure Go (inference-only). The implementation aims to support a 4096-token context and leverage goroutine-based concurrency to utilize multiple CPU cores.

* Model: [BitNet-b1.58-2B-4T](https://huggingface.co/microsoft/BitNet-b1.58-2B-4T)
* Research Paper: https://arxiv.org/abs/2310.11453
* Our development branch: [`bitnet`](https://github.com/hyperifyio/gnd/tree/bitnet)
* Addition to the sub-task list below, you can find [all work issues related to this feature](https://github.com/hyperifyio/gnd/issues?q=is%3Aissue%20state%3Aopen%20label%3Abitnet%20label%3Atask) using labels `bitnet` and `task`
* Feedback welcome! See [draft PR to main](https://github.com/hyperifyio/gnd/pull/205)
* Discord: [HyperifyIO](https://discord.com/invite/UBTrHxA78f)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pure Go LLM for CPUs #170

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pure Go LLM for CPUs #170

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions