go-rkllm

Go language bindings for the RKLLM (Rockchip Large Language Model) runtime. This library provides a clean, idiomatic Go interface for running large language models on Rockchip devices.

Features

Complete coverage of the RKLLM C API
Idiomatic Go interface and error handling
Automatic memory management
Support for all RKLLM input types:
- Text prompts
- Token IDs
- Embedding vectors
- Multimodal inputs (text + image)
Synchronous and asynchronous inference
Callback-based streaming text generation
LoRA adapter support
Prompt caching for improved performance

Installation

go get github.com/maxwelljun/go-rkllm

Dependencies

RKLLM runtime library (librkllmrt.so)
Rockchip NPU drivers

Quick Start

Here's a simple example showing how to initialize the model and generate text:

package main

import (
    "fmt"
    "os"
    
    "github.com/maxwelljun/go-rkllm"
)

// Callback function to receive generated text
func resultCallback(result *rkllm.Result, userData interface{}, state rkllm.LLMCallState) {
    switch state {
    case rkllm.RunFinish:
        fmt.Println("\nGeneration complete")
    case rkllm.RunNormal:
        fmt.Print(result.Text) // Print tokens as they're generated
    case rkllm.RunError:
        fmt.Println("\nError occurred during generation")
    }
}

func main() {
    // Configure model parameters
    param := rkllm.DefaultParam()
    param.ModelPath = "./model.rkllm"
    param.MaxNewTokens = 512
    param.MaxContextLen = 1024
    param.Temperature = 0.7
    param.TopP = 0.9
    
    // Initialize the model
    llm, err := rkllm.Init(param, resultCallback, nil)
    if err != nil {
        fmt.Println("Failed to initialize model:", err)
        os.Exit(1)
    }
    defer llm.Destroy() // Ensure resources are cleaned up
    
    // Prepare input
    input := &rkllm.Input{
        Type: rkllm.InputPrompt,
        PromptInput: "Explain quantum computing in simple terms.",
    }
    
    // Set inference parameters
    inferParam := &rkllm.InferParam{
        Mode: rkllm.InferGenerate,
    }
    
    // Run the model
    fmt.Println("Generating response...")
    if err := llm.Run(input, inferParam, llm.GetCallbackContext()); err != nil {
        fmt.Println("Inference failed:", err)
        os.Exit(1)
    }
}

Core Concepts

RKLLM Instance

The RKLLM struct represents an initialized model instance. Each instance has its own resources and can be used independently.

Input Types

The library supports all input types provided by the RKLLM runtime:

Text Prompts: The simplest input type - just provide a string
Token IDs: Pre-tokenized integer token ID inputs
Embedding Vectors: Floating-point token embedding vectors
Multimodal: A combination of text and image embeddings

Callback System

Results are passed through the callback function you provide during initialization. The callback receives:

Generated text (token by token)
Current state of the generation process
Any user data you provided

Parameter Tuning

Customize the model's behavior through parameters:

Sampling settings (temperature, top-k, top-p)
Token limits (max context length, max new tokens)
Repetition penalties
Caching options

Advanced Usage

Multimodal Inference

// Create a multimodal input with text and image
input := &rkllm.Input{
    Type: rkllm.InputMultimodal,
    MultimodalInput: rkllm.MultiModelInput{
        Prompt: "Describe this image:",
        ImageEmbed: imageEmbeddings, // float32 slice containing image embeddings
        NImageTokens: len(imageEmbeddings),
    },
}

// Run inference
llm.Run(input, inferParam, llm.GetCallbackContext())

LoRA Adapters

// Load a LoRA adapter
adapter := &rkllm.LoraAdapter{
    Path: "./adapter.bin",
    Name: "my_adapter",
    Scale: 1.0,
}
if err := llm.LoadLora(adapter); err != nil {
    fmt.Println("Failed to load LoRA adapter:", err)
    return
}

// Use the adapter during inference
inferParam := &rkllm.InferParam{
    Mode: rkllm.InferGenerate,
    LoraParams: &rkllm.LoraParam{
        LoraAdapterName: "my_adapter",
    },
}

// Generate with the LoRA adapter
llm.Run(input, inferParam, llm.GetCallbackContext())

Prompt Caching

Speed up repeated inferences by caching prompt processing:

// Save processed prompt to cache
inferParam := &rkllm.InferParam{
    Mode: rkllm.InferGenerate,
    PromptCacheParams: &rkllm.PromptCacheParam{
        SavePromptCache: true,
        PromptCachePath: "./prompt_cache.bin",
    },
}

// First run - slower but will save cache
llm.Run(input, inferParam, llm.GetCallbackContext())

// Load cache for subsequent runs
llm.LoadPromptCache("./prompt_cache.bin")

// Create new inference params (no longer saving cache)
inferParam = &rkllm.InferParam{
    Mode: rkllm.InferGenerate,
}

// Subsequent runs - faster
llm.Run(input, inferParam, llm.GetCallbackContext())

// Release the cache when no longer needed
llm.ReleasePromptCache()

Asynchronous Inference

Run the model asynchronously to avoid blocking:

// Start async inference
if err := llm.RunAsync(input, inferParam, llm.GetCallbackContext()); err != nil {
    fmt.Println("Failed to start async inference:", err)
    return
}

// Check if still running
for {
    isRunning, _ := llm.IsRunning()
    if !isRunning {
        break
    }
    // Do other work while model generates
    time.Sleep(100 * time.Millisecond)
}

// Abort generation if needed
llm.Abort()

API Reference

Main Types

// RKLLM - The main model instance
type RKLLM struct {
    // (fields unexported)
}

// Input - Model input
type Input struct {
    Type            InputType
    PromptInput     string
    EmbedInput      EmbedInput
    TokenInput      TokenInput
    MultimodalInput MultiModelInput
}

// Result - Generation results
type Result struct {
    Text            string
    TokenID         int32
    LastHiddenLayer ResultLastHiddenLayer
}

// Param - Model configuration parameters
type Param struct {
    ModelPath        string
    MaxContextLen    int32
    MaxNewTokens     int32
    TopK             int32
    TopP             float32
    Temperature      float32
    RepeatPenalty    float32
    FrequencyPenalty float32
    PresencePenalty  float32
    // (other fields available)
}

// InferParam - Inference-specific parameters
type InferParam struct {
    Mode               InferMode
    LoraParams         *LoraParam
    PromptCacheParams  *PromptCacheParam
}

// ResultCallback - Function type for receiving results
type ResultCallback func(result *Result, userData interface{}, state LLMCallState)

Main Methods

// Create default parameters
func DefaultParam() *Param

// Initialize model
func Init(param *Param, callback ResultCallback, userData interface{}) (*RKLLM, error)

// Run inference synchronously
func (r *RKLLM) Run(input *Input, inferParam *InferParam, userData unsafe.Pointer) error

// Run inference asynchronously
func (r *RKLLM) RunAsync(input *Input, inferParam *InferParam, userData unsafe.Pointer) error

// Load a LoRA adapter
func (r *RKLLM) LoadLora(adapter *LoraAdapter) error

// Load prompt cache
func (r *RKLLM) LoadPromptCache(path string) error

// Release loaded prompt cache
func (r *RKLLM) ReleasePromptCache() error

// Check if model is currently running
func (r *RKLLM) IsRunning() (bool, error)

// Abort ongoing generation
func (r *RKLLM) Abort() error

// Get callback context for this instance
func (r *RKLLM) GetCallbackContext() unsafe.Pointer

// Clean up resources
func (r *RKLLM) Destroy() error

Memory Management

The library handles memory management automatically:

All C memory allocations are properly freed when no longer needed
Go finalizers ensure proper cleanup even if Destroy() is not explicitly called
Resources are released in a timely manner when using defer llm.Destroy()

Thread Safety

This library is not thread-safe by default. If you need to use the model from multiple goroutines:

Use separate model instances for each goroutine, or
Implement your own synchronization around shared instances

Error Handling

All functions that can fail return Go errors that should be checked:

llm, err := rkllm.Init(param, resultCallback, nil)
if err != nil {
    // Handle initialization error
}

if err := llm.Run(input, inferParam, llm.GetCallbackContext()); err != nil {
    // Handle inference error
}

Examples

See the examples directory for complete working examples:

Memory Optimization

go-rkllm provides some memory optimization features to help avoid memory leaks during long text generation scenarios:

// Add memory optimization to main.go
import (
    "runtime"
    "time"
    
    "github.com/maxwelljun/go-rkllm"
)

// Set up periodic memory cleanup in long-running applications
func main() {
    // ... initialization code ...
    
    // Set up a ticker for periodic memory cleanup
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop()
    
    go func() {
        for range ticker.C {
            runtime.GC() // Force garbage collection
        }
    }()
    
    // ... application logic ...
}

// Handle long text in the callback function
func resultCallback(result *rkllm.Result, userData interface{}, state rkllm.LLMCallState) {
    switch state {
    case rkllm.RunFinish:
        fmt.Println("\nGeneration complete")
        runtime.GC() // Force garbage collection after text generation completes
    case rkllm.RunNormal:
        if result != nil && result.Text != "" {
            fmt.Print(result.Text)
        }
    // ... handle other states ...
    }
}

License

MIT License

Acknowledgements

Thanks to Rockchip for providing the RKLLM runtime

Contributing

Contributions are welcome! Feel free to submit issues or pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
README-CN.md		README-CN.md
README.md		README.md
callback.go		callback.go
go.mod		go.mod
main		main
rkllm.go		rkllm.go
types.go		types.go
utils.go		utils.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

go-rkllm

Features

Installation

Dependencies

Quick Start

Core Concepts

RKLLM Instance

Input Types

Callback System

Parameter Tuning

Advanced Usage

Multimodal Inference

LoRA Adapters

Prompt Caching

Asynchronous Inference

API Reference

Main Types

Main Methods

Memory Management

Thread Safety

Error Handling

Examples

Memory Optimization

License

Acknowledgements

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

maxwelljun/go-rkllm

Folders and files

Latest commit

History

Repository files navigation

go-rkllm

Features

Installation

Dependencies

Quick Start

Core Concepts

RKLLM Instance

Input Types

Callback System

Parameter Tuning

Advanced Usage

Multimodal Inference

LoRA Adapters

Prompt Caching

Asynchronous Inference

API Reference

Main Types

Main Methods

Memory Management

Thread Safety

Error Handling

Examples

Memory Optimization

License

Acknowledgements

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages