Go language bindings for the RKLLM (Rockchip Large Language Model) runtime. This library provides a clean, idiomatic Go interface for running large language models on Rockchip devices.
- Complete coverage of the RKLLM C API
- Idiomatic Go interface and error handling
- Automatic memory management
- Support for all RKLLM input types:
- Text prompts
- Token IDs
- Embedding vectors
- Multimodal inputs (text + image)
- Synchronous and asynchronous inference
- Callback-based streaming text generation
- LoRA adapter support
- Prompt caching for improved performance
go get github.com/maxwelljun/go-rkllm
- RKLLM runtime library (
librkllmrt.so
) - Rockchip NPU drivers
Here's a simple example showing how to initialize the model and generate text:
package main
import (
"fmt"
"os"
"github.com/maxwelljun/go-rkllm"
)
// Callback function to receive generated text
func resultCallback(result *rkllm.Result, userData interface{}, state rkllm.LLMCallState) {
switch state {
case rkllm.RunFinish:
fmt.Println("\nGeneration complete")
case rkllm.RunNormal:
fmt.Print(result.Text) // Print tokens as they're generated
case rkllm.RunError:
fmt.Println("\nError occurred during generation")
}
}
func main() {
// Configure model parameters
param := rkllm.DefaultParam()
param.ModelPath = "./model.rkllm"
param.MaxNewTokens = 512
param.MaxContextLen = 1024
param.Temperature = 0.7
param.TopP = 0.9
// Initialize the model
llm, err := rkllm.Init(param, resultCallback, nil)
if err != nil {
fmt.Println("Failed to initialize model:", err)
os.Exit(1)
}
defer llm.Destroy() // Ensure resources are cleaned up
// Prepare input
input := &rkllm.Input{
Type: rkllm.InputPrompt,
PromptInput: "Explain quantum computing in simple terms.",
}
// Set inference parameters
inferParam := &rkllm.InferParam{
Mode: rkllm.InferGenerate,
}
// Run the model
fmt.Println("Generating response...")
if err := llm.Run(input, inferParam, llm.GetCallbackContext()); err != nil {
fmt.Println("Inference failed:", err)
os.Exit(1)
}
}
The RKLLM
struct represents an initialized model instance. Each instance has its own resources and can be used independently.
The library supports all input types provided by the RKLLM runtime:
- Text Prompts: The simplest input type - just provide a string
- Token IDs: Pre-tokenized integer token ID inputs
- Embedding Vectors: Floating-point token embedding vectors
- Multimodal: A combination of text and image embeddings
Results are passed through the callback function you provide during initialization. The callback receives:
- Generated text (token by token)
- Current state of the generation process
- Any user data you provided
Customize the model's behavior through parameters:
- Sampling settings (temperature, top-k, top-p)
- Token limits (max context length, max new tokens)
- Repetition penalties
- Caching options
// Create a multimodal input with text and image
input := &rkllm.Input{
Type: rkllm.InputMultimodal,
MultimodalInput: rkllm.MultiModelInput{
Prompt: "Describe this image:",
ImageEmbed: imageEmbeddings, // float32 slice containing image embeddings
NImageTokens: len(imageEmbeddings),
},
}
// Run inference
llm.Run(input, inferParam, llm.GetCallbackContext())
// Load a LoRA adapter
adapter := &rkllm.LoraAdapter{
Path: "./adapter.bin",
Name: "my_adapter",
Scale: 1.0,
}
if err := llm.LoadLora(adapter); err != nil {
fmt.Println("Failed to load LoRA adapter:", err)
return
}
// Use the adapter during inference
inferParam := &rkllm.InferParam{
Mode: rkllm.InferGenerate,
LoraParams: &rkllm.LoraParam{
LoraAdapterName: "my_adapter",
},
}
// Generate with the LoRA adapter
llm.Run(input, inferParam, llm.GetCallbackContext())
Speed up repeated inferences by caching prompt processing:
// Save processed prompt to cache
inferParam := &rkllm.InferParam{
Mode: rkllm.InferGenerate,
PromptCacheParams: &rkllm.PromptCacheParam{
SavePromptCache: true,
PromptCachePath: "./prompt_cache.bin",
},
}
// First run - slower but will save cache
llm.Run(input, inferParam, llm.GetCallbackContext())
// Load cache for subsequent runs
llm.LoadPromptCache("./prompt_cache.bin")
// Create new inference params (no longer saving cache)
inferParam = &rkllm.InferParam{
Mode: rkllm.InferGenerate,
}
// Subsequent runs - faster
llm.Run(input, inferParam, llm.GetCallbackContext())
// Release the cache when no longer needed
llm.ReleasePromptCache()
Run the model asynchronously to avoid blocking:
// Start async inference
if err := llm.RunAsync(input, inferParam, llm.GetCallbackContext()); err != nil {
fmt.Println("Failed to start async inference:", err)
return
}
// Check if still running
for {
isRunning, _ := llm.IsRunning()
if !isRunning {
break
}
// Do other work while model generates
time.Sleep(100 * time.Millisecond)
}
// Abort generation if needed
llm.Abort()
// RKLLM - The main model instance
type RKLLM struct {
// (fields unexported)
}
// Input - Model input
type Input struct {
Type InputType
PromptInput string
EmbedInput EmbedInput
TokenInput TokenInput
MultimodalInput MultiModelInput
}
// Result - Generation results
type Result struct {
Text string
TokenID int32
LastHiddenLayer ResultLastHiddenLayer
}
// Param - Model configuration parameters
type Param struct {
ModelPath string
MaxContextLen int32
MaxNewTokens int32
TopK int32
TopP float32
Temperature float32
RepeatPenalty float32
FrequencyPenalty float32
PresencePenalty float32
// (other fields available)
}
// InferParam - Inference-specific parameters
type InferParam struct {
Mode InferMode
LoraParams *LoraParam
PromptCacheParams *PromptCacheParam
}
// ResultCallback - Function type for receiving results
type ResultCallback func(result *Result, userData interface{}, state LLMCallState)
// Create default parameters
func DefaultParam() *Param
// Initialize model
func Init(param *Param, callback ResultCallback, userData interface{}) (*RKLLM, error)
// Run inference synchronously
func (r *RKLLM) Run(input *Input, inferParam *InferParam, userData unsafe.Pointer) error
// Run inference asynchronously
func (r *RKLLM) RunAsync(input *Input, inferParam *InferParam, userData unsafe.Pointer) error
// Load a LoRA adapter
func (r *RKLLM) LoadLora(adapter *LoraAdapter) error
// Load prompt cache
func (r *RKLLM) LoadPromptCache(path string) error
// Release loaded prompt cache
func (r *RKLLM) ReleasePromptCache() error
// Check if model is currently running
func (r *RKLLM) IsRunning() (bool, error)
// Abort ongoing generation
func (r *RKLLM) Abort() error
// Get callback context for this instance
func (r *RKLLM) GetCallbackContext() unsafe.Pointer
// Clean up resources
func (r *RKLLM) Destroy() error
The library handles memory management automatically:
- All C memory allocations are properly freed when no longer needed
- Go finalizers ensure proper cleanup even if
Destroy()
is not explicitly called - Resources are released in a timely manner when using
defer llm.Destroy()
This library is not thread-safe by default. If you need to use the model from multiple goroutines:
- Use separate model instances for each goroutine, or
- Implement your own synchronization around shared instances
All functions that can fail return Go errors that should be checked:
llm, err := rkllm.Init(param, resultCallback, nil)
if err != nil {
// Handle initialization error
}
if err := llm.Run(input, inferParam, llm.GetCallbackContext()); err != nil {
// Handle inference error
}
See the examples
directory for complete working examples:
go-rkllm provides some memory optimization features to help avoid memory leaks during long text generation scenarios:
// Add memory optimization to main.go
import (
"runtime"
"time"
"github.com/maxwelljun/go-rkllm"
)
// Set up periodic memory cleanup in long-running applications
func main() {
// ... initialization code ...
// Set up a ticker for periodic memory cleanup
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()
go func() {
for range ticker.C {
runtime.GC() // Force garbage collection
}
}()
// ... application logic ...
}
// Handle long text in the callback function
func resultCallback(result *rkllm.Result, userData interface{}, state rkllm.LLMCallState) {
switch state {
case rkllm.RunFinish:
fmt.Println("\nGeneration complete")
runtime.GC() // Force garbage collection after text generation completes
case rkllm.RunNormal:
if result != nil && result.Text != "" {
fmt.Print(result.Text)
}
// ... handle other states ...
}
}
- Thanks to Rockchip for providing the RKLLM runtime
Contributions are welcome! Feel free to submit issues or pull requests.