-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Labels
enhancementNew feature or requestNew feature or request
Description
With ggml-org/llama.cpp#12344 merged, it should be possible to support e.g. gemma3 vision capabilities. Wishlisty for now until we get more dev time, but potentially compelling for e.g. inferencing something shown in a game at runtime (whether a PoV or sketches/etc).
Resources:
- ggml - #12348 - instructions to build and run llama-gemma3-cli
- llama.cpp - gemma3-cli.cpp code that is used to run image embedding.
Todo:
- Rough implementation showing 1:1 image inference for a fixed target
- Streamline input to be properly queued (threaded)
-
InsertImage
API -
InsertTemplatedPromptWithImage
combined call API - Auto-convert file
UTexture2D*
type to image understood by clip encoder - Test Gemma3 4B and 12B capabilities running in engine
- Auto-convert any image file format into
UTexture2D*
and embed (for e.g. network receive or file read)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request