-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat(transform, chat, gemini, media): Gemini enable video processing #6150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces support for video content by updating the Gemini transformer and generalizing media handling in the UI. Key changes include: - Extending gemini-format.ts to process video content blocks. - Adding tests to gemini-format.spec.ts to validate video and mixed media handling. - Refactoring the chat UI to use a generic MediaThumbnails component. - Introducing a getMimeType utility for identifying media types from data URIs. Closes: RooCodeInc#6144
- Consolidate duplicate getMimeType functions into shared utilities - Remove duplicate MediaThumbnails component, enhance Thumbnails to support video - Add JSDoc comments to VideoContentBlock interface - Convert inline styles to Tailwind classes in ChatRow - Add robust error handling for video processing - Create centralized media configuration for accepted file types - Ensure consistent test naming conventions - Fix ESLint warnings
Hi @VooDisss, I've addressed all the review feedback in a new branch
You can view the changes at: https://github.com/RooCodeInc/Roo-Code/tree/pr-6150 Since this PR is from your fork, you'll need to either:
Thank you for your contribution! |
- Consolidate duplicate getMimeType functions into shared utilities - Remove duplicate MediaThumbnails component, enhance Thumbnails to support video - Add JSDoc comments to VideoContentBlock interface - Convert inline styles to Tailwind classes in ChatRow - Add robust error handling for video processing - Create centralized media configuration for accepted file types - Ensure consistent test naming conventions - Fix ESLint warnings
@hannesrudolph thank you for your edits in your I have ran |
Related GitHub Issue
Closes: #6144
Big thanks to jordanrendric for helping by providing working proof of concept that it's possible.
Roo Code Task Context (Optional)
Description
This pull request introduces support for video content processing for Gemini models. The changes generalize the media handling in the chat UI to support both images and videos, and update the data transformation logic to correctly format video content for the Gemini API.
Key implementation details:
gemini-format.ts
transformer has been extended to processvideo
content blocks, ensuring they are correctly converted to the Gemini API format. It also now sorts content parts to place media before text.selectedImages
state withselectedMedia
inChatView.tsx
,ChatRow.tsx
, andChatTextArea.tsx
.MediaThumbnails
component to display thumbnails for different media types (images and videos).ChatView.tsx
now dynamically determines the accepted file types based on the selected model's capabilities, enabling video formats like MP4, MOV, etc., for supported Gemini models.getMimeType
utility function was added to reliably identify media types from data URIs.Test Procedure
The changes have been tested through both automated unit tests and manual verification.
Unit Tests:
gemini-format.spec.ts
to cover video and mixed-media content transformations.src
directory:npx vitest run api/transform/__tests__/gemini-format.spec.ts
Manual Testing:
Reviewers can verify the changes by following these steps:
gemini-2.5-pro
)..mp4
or.mov
file) into the chat text area.Pre-Submission Checklist
Screenshots / Videos
Documentation Updates
Additional Notes
This change paves the way for supporting more multimodal inputs in the future.
Get in Touch
Important
Enable video processing for Gemini models by extending media handling in transformation logic and UI components.
convertAnthropicContentToGemini
ingemini-format.ts
to supportvideo
content blocks.selectedImages
withselectedMedia
inChatView.tsx
,ChatRow.tsx
, andChatTextArea.tsx
.MediaThumbnails
component for displaying media thumbnails.ChatView.tsx
dynamically determines accepted file types based on model capabilities.getMimeType
utility to identify media types from data URIs.gemini-format.spec.ts
for video content transformation.ChatTextArea.spec.tsx
to test new media handling logic.This description was created by
for ec674d2. You can customize this summary. It will automatically update as commits are pushed.