ezrknn-llm

This repo tries to make RKNN LLM usage easier for people who don't want to read through Rockchip's docs.

Main repo is https://github.com/Pelochus/ezrknpu where you can find more instructions, documentation... for general use. This repo is intended for details in RKLLM and also how to convert models.

Requirements

Keep in mind this repo is focused for:

High-end Rockchip SoCs, mainly the RK3588
Linux, not Android
Linux kernels from Rockchip (as of writing 5.10 and 6.1 from Rockchip should work, if your board has one of these it will very likely be Rockchip's kernel)

Quick Install

First clone the repo:

git clone https://github.com/Pelochus/ezrknn-llm

Then run:

cd ezrknn-llm && bash install.sh

Test

Run (assuming you are on the folder where your .rkllm file is located):

rkllm name-of-the-model.rkllm # Or any other model you like

Converting LLMs for Rockchip's NPUs

Docker

In order to do this, you need a Linux PC x86 (Intel or AMD). Currently, Rockchip does not provide ARM support for converting models, so can't be done on a Orange Pi or similar. You can use my Docker container which makes things easier, however, I do not provide a guide for this (it used to be easier); you are on your own here:

docker run -it pelochus/ezrkllm-toolkit:latest bash

I recommend checking already converted models by the community on HuggingFace. Search for something like "name-of-the-model RK3588" or similar. Thanks to everyone converting models!

Fixing hallucinating LLMs

Check this reddit post if you LLM seems to be responding garbage:

https://www.reddit.com/r/RockchipNPU/comments/1cpngku/rknnllm_v101_lets_talk_about_converting_and/

Original README starts below

Description

RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:

In order to use RKNPU, users need to first run the RKLLM-Toolkit tool on the computer, convert the trained model into an RKLLM format model, and then inference on the development board using the RKLLM C API.

RKLLM-Toolkit is a software development kit for users to perform model conversionand quantization on PC.
RKLLM Runtime provides C/C++ programming interfaces for Rockchip NPU platform to help users deploy RKLLM models and accelerate the implementation of LLM applications.
RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.

Support Platform

RK3588 Series
RK3576 Series
RK3562 Series
RV1126B Series

Support Models

Model Performance

Benchmark results of common LLMs.

Performance Testing Methods

Run the frequency-setting script from the scripts directory on the target platform.
Execute export RKLLM_LOG_LEVEL=1 on the device to log model inference performance and memory usage.
Use the eval_perf_watch_cpu.sh script to measure CPU utilization.
Use the eval_perf_watch_npu.sh script to measure NPU utilization.

Download

You can download the latest package from RKLLM_SDK, fetch code: rkllm
You can download the converted rkllm model from rkllm_model_zoo, fetch code: rkllm

Examples

Multimodel deployment demo: Qwen2-VL_Demo
API usage demo: DeepSeek-R1-Distill-Qwen-1.5B_Demo
API server demo: rkllm_server_demo
Multimodal_Interactive_Dialogue_Demo Multimodal_Interactive_Dialogue_Demo

Note

The supported Python versions are:
- Python 3.8
- Python 3.9
- Python 3.10
- Python 3.11
- Python 3.12

Note: Before installing package in a Python 3.12 environment, please run the command:

export BUILD_CUDA_EXT=0

On some platforms, you may encounter an error indicating that libomp.so cannot be found. To resolve this, locate the library in the corresponding cross-compilation toolchain and place it in the board's lib directory, at the same level as librkllmrt.so.
RWKV model conversion only supports Python 3.12. Please use requirements_rwkv7.txt to set up the pip environment.
Latest version: v1.2.1

RKNN Toolkit2

If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:

https://github.com/airockchip/rknn-toolkit2

CHANGELOG

v1.2.1

Added support for RWKV7, Qwen3, and MiniCPM4 models
Added support for the RV1126B platform
Enabled function calling capability
Enabled cross-attention inference
Optimize the callback function to support pausing inference
Supported multi-batch inference
Optimized KV cache clearing interface
Improved chat template parsing with support for thinking mode selection
Server demo updated to support OpenAI-compatible format
Added return of model inference performance statistics
Supported mrope multimodal position encoding
A new quantization optimization algorithm has been added to improve quantization accuracy

for older version, please refer CHANGELOG

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
doc		doc
docker		docker
examples		examples
res		res
rkllm-runtime		rkllm-runtime
rkllm-toolkit		rkllm-toolkit
rknpu-driver		rknpu-driver
scripts		scripts
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
benchmark.md		benchmark.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ezrknn-llm

Requirements

Quick Install

Test

Converting LLMs for Rockchip's NPUs

Docker

Fixing hallucinating LLMs

Original README starts below

Description

Support Platform

Support Models

Model Performance

Performance Testing Methods

Download

Examples

Note

RKNN Toolkit2

CHANGELOG

v1.2.1

About

Uh oh!

Releases

Packages

Languages

License

Pelochus/ezrknn-llm

Folders and files

Latest commit

History

Repository files navigation

ezrknn-llm

Requirements

Quick Install

Test

Converting LLMs for Rockchip's NPUs

Docker

Fixing hallucinating LLMs

Original README starts below

Description

Support Platform

Support Models

Model Performance

Performance Testing Methods

Download

Examples

Note

RKNN Toolkit2

CHANGELOG

v1.2.1

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages