When crawling confluence, send pictures that are encountered to an LLM to include textual description into knowledge base #31683
Replies: 1 comment
-
Hi @Dennis-E! I'm Dosu and I’m helping the langchain team. You can achieve this workflow in LangChain by combining a few modular components. Use the ConfluenceLoader to extract both text and image attachments from Confluence pages. For each image, process it with an external image captioning or OCR API—LangChain provides an ImageCaptionLoader for this purpose, or you can use the SceneXplainTool, which wraps the SceneXplain API for generating descriptive captions from images. After generating captions, you can enrich your knowledge base by attaching these descriptions as metadata or new documents, making them accessible to your RAG pipeline and improving both accessibility and searchability. LangChain’s design encourages chaining these steps together, but you’ll need to integrate your preferred image-to-text model or API for the captioning step. For more on these components, see ImageCaptionLoader and ConfluenceLoader and SceneXplainTool. LangChain also supports multimodal inputs for OpenAI models, so you can pass images directly to LLMs that support this feature if needed (multimodal support details). To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I’m exploring a workflow to enhance our RAG chatbots based on Confluence knowledge base by automatically processing embedded images:
Goal: Improve accessibility, enable image-based search queries, and enrich the knowledge base with textual context for visual content.
Any suggestions if how to achieve this or integrate it to langchain?
Beta Was this translation helpful? Give feedback.
All reactions