web-llm-ui

demo.mp4

A React app I made to experiment with quantized models in the browser using WebGPU. The models are compiled to WebAssembly using MLC, which is like llama.cpp for the web.

I'm not going to update this, but the official app at chat.webllm.ai is actively maintained. Use that or one of xenova's WebGPU spaces instead.

Usage

bun install
bun start

Known issues

Using q4f32 quantized models, as q4f16 requires a flag. See webgpureport.org.

Cannot find global function

If you see this message, it is a cache issue. You can delete an individual cache with:

await caches.delete('webllm/wasm')

Or all caches:

await caches.keys().then(keys => Promise.all(keys.map(key => caches.delete(key))))

Reference

There is only 1 class you need to know to get started: ChatModule

const chat = new ChatModule()

// callback that fires on progress updates during initialization (e.g., fetching chunks)
type ProgressReport = { progress: number; text: string; timeElapsed: number }
type Callback = (report: ProgressReport) => void
const onProgress: Callback = ({ text }) => console.log(text)
chat.setInitProgressCallback(onProgress)

// load/reload with new model
// customize `temperature`, `repetition_penalty`, `top_p`, etc. in `options`
// set system message in `options.conv_config.system`
// defaults are in conversation.ts and the model's mlc-chat-config.json
import type { ChatOptions } from '@mlc-ai/web-llm'
import config from './src/config'
const id = 'TinyLlama-1.1B-Chat-v0.4-q4f32_1-1k'
const options: ChatOptions = { temperature: 0.9, conv_config: { system: 'You are a helpful assistant.' } }
await chat.reload(id, options, config)

// generate response from prompt
// callback fired on each generation step
// returns the complete response string when resolved
type Callback = (step: number, message: string) => void
const onGenerate: Callback = (_, message) => console.log(message)
const response = await chat.generate('What would you like to talk about?', onGenerate)

// get last response (sync)
const message: string = chat.getMessage()

// interrupt generation if in progress (sync)
// resolves the Promise returned by `generate`
chat.interruptGenerate()

// check if generation has stopped (sync)
// shorthand for `chat.getPipeline().stopped()`
const isStopped: boolean = chat.stopped()

// reset chat, optionally keep stats (defaults to false)
const keepStats = true
await chat.resetChat(keepStats)

// get stats
// shorthand for `await chat.getPipeline().getRuntimeStatsText()`
const statsText: string = await chat.runtimeStatsText()

// unload model from memory
await chat.unload()

// get GPU vendor
const vendor: string = await chat.getGPUVendor()

// get max storage buffer binding size
// used to determine the `low_resource_required` flag
const bufferBindingSize: number = await chat.getMaxStorageBufferBindingSize()

// getPipeline is private (useful for debugging in dev tools)
const pipeline = chat.getPipeline()

Cache management

The library uses the browser's CacheStorage API to store models and their configs.

There is an exported helper function to check if a model is in the cache.

import { hasModelInCache } from '@mlc-ai/web-llm'
import config from './config'
const inCache = hasModelInCache('Phi2-q4f32_1', config) // throws if model ID is not in the config

VRAM requirements

See utils/vram_requirements in the Web LLM repo.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.vscode		.vscode
public		public
scripts		scripts
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.postcssrc.json		.postcssrc.json
biome.json		biome.json
bun.lockb		bun.lockb
codeowners		codeowners
demo.mp4		demo.mp4
index.html		index.html
license		license
package.json		package.json
readme.md		readme.md
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

web-llm-ui

Usage

Known issues

Cannot find global function

Reference

Cache management

VRAM requirements

About

Uh oh!

Uh oh!

Languages

License

adamelliotfields/web-llm-ui

Folders and files

Latest commit

History

Repository files navigation

web-llm-ui

Usage

Known issues

Cannot find global function

Reference

Cache management

VRAM requirements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages