[WIP]: Add multimodal support via lance #39

AyushExel · 2025-06-09T17:18:38Z

This commit introduces multimodal parsing capabilities to the `ingest` command, allowing the extraction of both text and associated image data from various document formats. Key changes include: - A new `--multimodal` flag for the `ingest` command in `synthetic_data_kit/cli.py`. - Updates to `synthetic_data_kit/core/ingest.py` to manage the multimodal workflow. - Modifications to the following parsers in `synthetic_data_kit/parsers/` (DOCX, PDF, PPTX, HTML) to: - Extract images when `--multimodal` is enabled. - Implement initial image-text association logic: - DOCX: First image in document associated with all text blocks. - PDF: First image on a page associated with all text from that page. - PPTX: First image on a slide associated with all text from that slide. - HTML: Text from content tags and `alt`-text from `<img>` tags are extracted, with images linked to their `alt`-text entries. - Their `save` methods now create Lance datasets with 'text' (string) and 'image' (binary, or None) columns when in multimodal mode. - Unit tests for parser `parse()` methods have been added. HTML and TXT tests are functional. DOCX tests use mocking. PDF and PPTX `parse()` tests have ongoing mocking challenges. All `save()` method tests are currently blocked by an external `lance` library environment issue. - `README.md` has been updated to document the new feature, usage, and association heuristics. This feature provides a foundational capability for processing documents with embedded images. Future enhancements may include more sophisticated image-text association techniques.

AyushExel and others added 8 commits May 7, 2025 14:30

use lance instead of txt.

50bbc47

support lance format for creation

faf93e9

update

977dee8

update

7334f3f

add debug log

7c7c889

propogate image to LLM generators

9b08037

remove tests

c40fe2e

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 9, 2025

add multi-modal support CoT, QA

97f015f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP]: Add multimodal support via lance #39

[WIP]: Add multimodal support via lance #39

Uh oh!

AyushExel commented Jun 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

[WIP]: Add multimodal support via lance #39

Are you sure you want to change the base?

[WIP]: Add multimodal support via lance #39

Uh oh!

Conversation

AyushExel commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AyushExel commented Jun 9, 2025 •

edited

Loading