Skip to content

Vision support (e.g. gemma3/llava) #36

@getnamo

Description

@getnamo

With ggml-org/llama.cpp#12344 merged, it should be possible to support e.g. gemma3 vision capabilities. Wishlisty for now until we get more dev time, but potentially compelling for e.g. inferencing something shown in a game at runtime (whether a PoV or sketches/etc).

Resources:

Todo:

  • Rough implementation showing 1:1 image inference for a fixed target
  • Streamline input to be properly queued (threaded)
  • InsertImage API
  • InsertTemplatedPromptWithImage combined call API
  • Auto-convert file UTexture2D* type to image understood by clip encoder
  • Test Gemma3 4B and 12B capabilities running in engine
  • Auto-convert any image file format into UTexture2D* and embed (for e.g. network receive or file read)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions