[Feat]: Support for multimodal models (e.g., Phi-4-multimodal)

**Description**
With the advent of multimodal modals like Phi-4-multimodal (5.6b parameters), it might be interesting to enable interacting with these models in the app.

**Use Case**
Would be great to upload an image and talk about it. As Phi-4-multimodal also support speech, that could also be included in the feature request. However, I think images might have a higher priority, for instance for visually impaired pocketpal users who could start relying on offline on-device image recognition and descriptions.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feat]: Support for multimodal models (e.g., Phi-4-multimodal) #230

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Feat]: Support for multimodal models (e.g., Phi-4-multimodal) #230

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions