Vision support (e.g. gemma3/llava)

With https://github.com/ggml-org/llama.cpp/pull/12344/ merged, it should be possible to support e.g. gemma3 vision capabilities. Wishlisty for now until we get more dev time, but potentially compelling for e.g. inferencing something shown in a game at runtime (whether a PoV or sketches/etc).

### Resources:
- [ggml - #12348](https://github.com/ggml-org/llama.cpp/discussions/12348) - instructions to build and run llama-gemma3-cli
- [llama.cpp - gemma3-cli.cpp](https://github.com/ggml-org/llama.cpp/blob/master/examples/llava/gemma3-cli.cpp) code that is used to run image embedding.

### Todo:
- [ ] Rough implementation showing 1:1 image inference for a fixed target
- [ ] Streamline input to be properly queued (threaded)
- [ ] `InsertImage` API
- [ ] `InsertTemplatedPromptWithImage` combined call API
- [ ] Auto-convert file `UTexture2D*` type to image understood by clip encoder
- [ ] Test Gemma3 4B and 12B capabilities running in engine
- [ ] Auto-convert any image file format into `UTexture2D*` and embed (for e.g. network receive or file read) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vision support (e.g. gemma3/llava) #36

Resources:

Todo:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Vision support (e.g. gemma3/llava) #36

Description

Resources:

Todo:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions