Skip to content

maxwelljun/go-rkllm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-rkllm

Go language bindings for the RKLLM (Rockchip Large Language Model) runtime. This library provides a clean, idiomatic Go interface for running large language models on Rockchip devices.

Features

  • Complete coverage of the RKLLM C API
  • Idiomatic Go interface and error handling
  • Automatic memory management
  • Support for all RKLLM input types:
    • Text prompts
    • Token IDs
    • Embedding vectors
    • Multimodal inputs (text + image)
  • Synchronous and asynchronous inference
  • Callback-based streaming text generation
  • LoRA adapter support
  • Prompt caching for improved performance

Installation

go get github.com/maxwelljun/go-rkllm

Dependencies

  • RKLLM runtime library (librkllmrt.so)
  • Rockchip NPU drivers

Quick Start

Here's a simple example showing how to initialize the model and generate text:

package main

import (
    "fmt"
    "os"
    
    "github.com/maxwelljun/go-rkllm"
)

// Callback function to receive generated text
func resultCallback(result *rkllm.Result, userData interface{}, state rkllm.LLMCallState) {
    switch state {
    case rkllm.RunFinish:
        fmt.Println("\nGeneration complete")
    case rkllm.RunNormal:
        fmt.Print(result.Text) // Print tokens as they're generated
    case rkllm.RunError:
        fmt.Println("\nError occurred during generation")
    }
}

func main() {
    // Configure model parameters
    param := rkllm.DefaultParam()
    param.ModelPath = "./model.rkllm"
    param.MaxNewTokens = 512
    param.MaxContextLen = 1024
    param.Temperature = 0.7
    param.TopP = 0.9
    
    // Initialize the model
    llm, err := rkllm.Init(param, resultCallback, nil)
    if err != nil {
        fmt.Println("Failed to initialize model:", err)
        os.Exit(1)
    }
    defer llm.Destroy() // Ensure resources are cleaned up
    
    // Prepare input
    input := &rkllm.Input{
        Type: rkllm.InputPrompt,
        PromptInput: "Explain quantum computing in simple terms.",
    }
    
    // Set inference parameters
    inferParam := &rkllm.InferParam{
        Mode: rkllm.InferGenerate,
    }
    
    // Run the model
    fmt.Println("Generating response...")
    if err := llm.Run(input, inferParam, llm.GetCallbackContext()); err != nil {
        fmt.Println("Inference failed:", err)
        os.Exit(1)
    }
}

Core Concepts

RKLLM Instance

The RKLLM struct represents an initialized model instance. Each instance has its own resources and can be used independently.

Input Types

The library supports all input types provided by the RKLLM runtime:

  • Text Prompts: The simplest input type - just provide a string
  • Token IDs: Pre-tokenized integer token ID inputs
  • Embedding Vectors: Floating-point token embedding vectors
  • Multimodal: A combination of text and image embeddings

Callback System

Results are passed through the callback function you provide during initialization. The callback receives:

  • Generated text (token by token)
  • Current state of the generation process
  • Any user data you provided

Parameter Tuning

Customize the model's behavior through parameters:

  • Sampling settings (temperature, top-k, top-p)
  • Token limits (max context length, max new tokens)
  • Repetition penalties
  • Caching options

Advanced Usage

Multimodal Inference

// Create a multimodal input with text and image
input := &rkllm.Input{
    Type: rkllm.InputMultimodal,
    MultimodalInput: rkllm.MultiModelInput{
        Prompt: "Describe this image:",
        ImageEmbed: imageEmbeddings, // float32 slice containing image embeddings
        NImageTokens: len(imageEmbeddings),
    },
}

// Run inference
llm.Run(input, inferParam, llm.GetCallbackContext())

LoRA Adapters

// Load a LoRA adapter
adapter := &rkllm.LoraAdapter{
    Path: "./adapter.bin",
    Name: "my_adapter",
    Scale: 1.0,
}
if err := llm.LoadLora(adapter); err != nil {
    fmt.Println("Failed to load LoRA adapter:", err)
    return
}

// Use the adapter during inference
inferParam := &rkllm.InferParam{
    Mode: rkllm.InferGenerate,
    LoraParams: &rkllm.LoraParam{
        LoraAdapterName: "my_adapter",
    },
}

// Generate with the LoRA adapter
llm.Run(input, inferParam, llm.GetCallbackContext())

Prompt Caching

Speed up repeated inferences by caching prompt processing:

// Save processed prompt to cache
inferParam := &rkllm.InferParam{
    Mode: rkllm.InferGenerate,
    PromptCacheParams: &rkllm.PromptCacheParam{
        SavePromptCache: true,
        PromptCachePath: "./prompt_cache.bin",
    },
}

// First run - slower but will save cache
llm.Run(input, inferParam, llm.GetCallbackContext())

// Load cache for subsequent runs
llm.LoadPromptCache("./prompt_cache.bin")

// Create new inference params (no longer saving cache)
inferParam = &rkllm.InferParam{
    Mode: rkllm.InferGenerate,
}

// Subsequent runs - faster
llm.Run(input, inferParam, llm.GetCallbackContext())

// Release the cache when no longer needed
llm.ReleasePromptCache()

Asynchronous Inference

Run the model asynchronously to avoid blocking:

// Start async inference
if err := llm.RunAsync(input, inferParam, llm.GetCallbackContext()); err != nil {
    fmt.Println("Failed to start async inference:", err)
    return
}

// Check if still running
for {
    isRunning, _ := llm.IsRunning()
    if !isRunning {
        break
    }
    // Do other work while model generates
    time.Sleep(100 * time.Millisecond)
}

// Abort generation if needed
llm.Abort()

API Reference

Main Types

// RKLLM - The main model instance
type RKLLM struct {
    // (fields unexported)
}

// Input - Model input
type Input struct {
    Type            InputType
    PromptInput     string
    EmbedInput      EmbedInput
    TokenInput      TokenInput
    MultimodalInput MultiModelInput
}

// Result - Generation results
type Result struct {
    Text            string
    TokenID         int32
    LastHiddenLayer ResultLastHiddenLayer
}

// Param - Model configuration parameters
type Param struct {
    ModelPath        string
    MaxContextLen    int32
    MaxNewTokens     int32
    TopK             int32
    TopP             float32
    Temperature      float32
    RepeatPenalty    float32
    FrequencyPenalty float32
    PresencePenalty  float32
    // (other fields available)
}

// InferParam - Inference-specific parameters
type InferParam struct {
    Mode               InferMode
    LoraParams         *LoraParam
    PromptCacheParams  *PromptCacheParam
}

// ResultCallback - Function type for receiving results
type ResultCallback func(result *Result, userData interface{}, state LLMCallState)

Main Methods

// Create default parameters
func DefaultParam() *Param

// Initialize model
func Init(param *Param, callback ResultCallback, userData interface{}) (*RKLLM, error)

// Run inference synchronously
func (r *RKLLM) Run(input *Input, inferParam *InferParam, userData unsafe.Pointer) error

// Run inference asynchronously
func (r *RKLLM) RunAsync(input *Input, inferParam *InferParam, userData unsafe.Pointer) error

// Load a LoRA adapter
func (r *RKLLM) LoadLora(adapter *LoraAdapter) error

// Load prompt cache
func (r *RKLLM) LoadPromptCache(path string) error

// Release loaded prompt cache
func (r *RKLLM) ReleasePromptCache() error

// Check if model is currently running
func (r *RKLLM) IsRunning() (bool, error)

// Abort ongoing generation
func (r *RKLLM) Abort() error

// Get callback context for this instance
func (r *RKLLM) GetCallbackContext() unsafe.Pointer

// Clean up resources
func (r *RKLLM) Destroy() error

Memory Management

The library handles memory management automatically:

  • All C memory allocations are properly freed when no longer needed
  • Go finalizers ensure proper cleanup even if Destroy() is not explicitly called
  • Resources are released in a timely manner when using defer llm.Destroy()

Thread Safety

This library is not thread-safe by default. If you need to use the model from multiple goroutines:

  • Use separate model instances for each goroutine, or
  • Implement your own synchronization around shared instances

Error Handling

All functions that can fail return Go errors that should be checked:

llm, err := rkllm.Init(param, resultCallback, nil)
if err != nil {
    // Handle initialization error
}

if err := llm.Run(input, inferParam, llm.GetCallbackContext()); err != nil {
    // Handle inference error
}

Examples

See the examples directory for complete working examples:

Memory Optimization

go-rkllm provides some memory optimization features to help avoid memory leaks during long text generation scenarios:

// Add memory optimization to main.go
import (
    "runtime"
    "time"
    
    "github.com/maxwelljun/go-rkllm"
)

// Set up periodic memory cleanup in long-running applications
func main() {
    // ... initialization code ...
    
    // Set up a ticker for periodic memory cleanup
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop()
    
    go func() {
        for range ticker.C {
            runtime.GC() // Force garbage collection
        }
    }()
    
    // ... application logic ...
}

// Handle long text in the callback function
func resultCallback(result *rkllm.Result, userData interface{}, state rkllm.LLMCallState) {
    switch state {
    case rkllm.RunFinish:
        fmt.Println("\nGeneration complete")
        runtime.GC() // Force garbage collection after text generation completes
    case rkllm.RunNormal:
        if result != nil && result.Text != "" {
            fmt.Print(result.Text)
        }
    // ... handle other states ...
    }
}

License

MIT License

Acknowledgements

  • Thanks to Rockchip for providing the RKLLM runtime

Contributing

Contributions are welcome! Feel free to submit issues or pull requests.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages