An end-to-end MM-LLM that perceive input and generate output in arbitrary combinations (any-to-any) of text, image, video, and audio and beyond.
Sort by: Most downloads
0 packages
No results matched your search.
Try browsing all packages to find what you're looking for.