llama.cpp [recently refactored](https://github.com/ggml-org/llama.cpp/pull/13311) their [support for multimodal input](https://github.com/ggml-org/llama.cpp/tree/master/tools/mtmd#multimodal-support-in-llamacpp) and it's getting closer to being stabilized, especially as there is now support in the server. It would be great adding Rust bindings for `/tools/mtmd` to allow using multimodal input also through llama-cpp-rs.