Skip to content

[Feat]: Support for multimodal models (e.g., Phi-4-multimodal) #230

@Duxon

Description

@Duxon

Description
With the advent of multimodal modals like Phi-4-multimodal (5.6b parameters), it might be interesting to enable interacting with these models in the app.

Use Case
Would be great to upload an image and talk about it. As Phi-4-multimodal also support speech, that could also be included in the feature request. However, I think images might have a higher priority, for instance for visually impaired pocketpal users who could start relying on offline on-device image recognition and descriptions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions